HP OpenView Network Node Manager

Performance and Configuration Guide

Windows, HP-UX, Linux, and Solaris operating systems

October, 2004 © Copyright 1996-2004 Hewlett-Packard Development Company, L.P. Legal Notices Warranty. Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be help liable for errors contained herein or direct, indirect, special, incidental or consequential damages in connection with the furnishing, performance, or use of this material. A copy of the specific warranty terms applicable to your Hewlett-Packard product can be obtained from your local Sales and Services Office. Restricted Rights Legend. Use, duplication or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause in DFARS 252.227-7013. Hewlett-Packard Company United States of America Rights for non-DOD U.S. Government Departments and Agencies are as set forth in FAR 52.227-19(c)(1,2). Copyright Notices. © Copyright 1996-2004 Hewlett-Packard Development Company, L.P. No part of this document may be copied, reproduced, or translated to another language without the prior written consent of Hewlett-Packard Company. The information contained in this material is subject to change without notice. Trademark Notices Microsoft® is a U.S. registered trademark of Microsoft Corporation. Windows® and MS Windows® are U.S. registered trademarks of Microsoft Corporation.

ii Contents

1 Introduction...... 1

2 Management System Sizing...... 3 Network Node Manager Advanced Edition...... 4 Network Node Manager Starter Edition...... 6 Scaling to Larger Standalone Environments ...... 8 OVW Sessions...... 11 Management Consoles...... 12

3 NNM AE Deployment Scenarios ...... 13 Summary of Environments ...... 14 Site Details ...... 14

4 Managing Large Networks with NNM Advanced Edition ...... 21 Targeting Your Critical Devices...... 22 Zone Discovery...... 25 Incremental Discovery...... 27 DNS Performance Issues...... 27

5 Polling Considerations ...... 29 Device Status Polling ...... 30 Other Polling Types...... 31 Device Discovery Polling Times ...... 33 Polling Limits in Large Environments: Test Cases and Results ...... 33 Event Processing Performance...... 35 Event Reduction ...... 36 Event Rate...... 39

6 System Tuning...... 43 NNM Advanced Edition System Tuning ...... 44 Disks...... 45 NNM Starter Edition Memory Requirements...... 45

7 NNM Application Tuning ...... 47 The On-Demand Submap Feature ...... 48 The ovwdb -n Option ...... 48 Filtering...... 49 Reclaiming Memory by Restarting...... 50 Management Consoles...... 51 Backup and Recovery Performance...... 52 Using EMANATE SNMP Agent on Windows ...... 52 Other Factors Affecting NNM Performance...... 53

8 Distributed Management ...... 55 Topology Synchronization ...... 56

iii Collection Station Synchronization Duration ...... 57

9 DIDM Testing and Results ...... 59 NNM Distributed Environment Test Setup...... 60 Data Collected ...... 65 Initial Synchronization...... 65 Resynchronization...... 66 Event Rate...... 67 Failover...... 69

10 NNM SE Sizing Considerations...... 71 Node and Object Calculation...... 72 Nodes...... 73 The Relationship of Nodes and Interfaces...... 75 Managed Nodes and Interfaces...... 75 Objects...... 76 Determining the Number of Objects ...... 76 Estimating the Number of Objects...... 76

iv 1 Introduction

Chapter 1 1 NNM Performance and Configuration Guide Introduction

This document contains performance characterization and configuration information for HP OpenView Network Node Manager (NNM) version 7.5, Advanced and Starter Editions. The information in this guide will assist you in sizing systems dedicated to running NNM AE and SE in your environment. This document assumes you are acquainted with the product structure, terminology, and architecture of NNM. If you are unfamiliar with these aspects of NNM, refer to Managing Your Network with HP OpenView Network Node Manager and A Guide to Scalability and Distribution for Network Node Manager. Evaluating scalability and sizing will differ between Network Node Manager Advanced Edition (NNM AE) and Network Node Manager Starter Edition (NNM SE). NNM AE’s Extended Topology enables the management of Layer 2 devices, in addition to providing views for HSRP, VLANs, OSPF backbone, and IPv6 networks. This dramatically changes the landscape for evaluating network complexity. NNM SE, which is restricted to Layer 3 management, views complexity in terms of the number of nodes, where a node would typically have an average of two to four interfaces. The Network Node Manager Advanced Edition section in Chapter 2, Management System Sizing, discusses the key variables that determine the complexity of networks at the Layer 2 level. It also provides guidelines concerning scalability through the use of example network environments.

2 Chapter 1 2 Management System Sizing

Chapter 2 3 NNM Performance and Configuration Guide Management System Sizing

Network Node Manager Advanced Edition The evaluation of network complexity for NNM AE is more than just the number of nodes due to NNM AE’s Extended Topology capability. This involves Layer 2 entities such as interfaces and physical connections. For example, a network of one thousand nodes consisting mostly of switches and routers, with each node averaging fifteen interfaces, would be far easier to manage than a network of two hundred nodes with each node averaging two hundred interfaces (assuming all the interfaces are connected to devices). The complexity would further increase with the introduction of connectivity, switch meshes, and VLANs. The key variables that influence the complexity of a network and the scale limit in terms of Layer 2 device management are: · Number of managed nodes · Number of interfaces · Number of VLANs · Number of switch meshes · Number of connected ports (indicating neighbors) All these key variables would add up to the number of objects within NNM AE Extended Topology’s management domain. For example, a network with 100 nodes, 1000 interfaces, 100 VLANs, 100 switch meshes and 1000 connected ports would generate 2300 objects in NNM AE’s management domain—more objects than a network with 1000 nodes, 1000 interfaces and 50 connected ports with no VLAN or Switch meshes (2050 objects in total). Note that in practice, for most networks, the total number of managed nodes plus the interfaces on those nodes provides a good approximation of system resources required to manage that environment. In summary, the variables described above affect sizing estimates for an NNM AE deployment. Estimates must take into consideration the number of managed nodes in conjunction with number of interfaces and how inter-connected the nodes are.

4 Chapter 2 NNM Performance and Configuration Guide Management System Sizing

Solaris HP-UX (PA- Windows RISC) 2000 Disk space (1000 nodes / 65000 interfaces) /opt/OV 512 Mb 512 Mb - /var/opt/OV 2 Gb 2 Gb - /etc/opt/OV 40 Mb 40 Mb - Approximate disk space 3 Gb 3 Gb 3 Gb required Disk space per additional 1000 nodes /opt/OV 3 Mb 3 Mb /var/opt/OV 512 Mb 512 Mb /etc/opt/OV 5 Mb 5 Mb Approximate additional 520 Mb 520 Mb 520 Mb disk space required Memory (1000 nodes / 65000 interfaces) RAM (Minimum) 512 Mb 512 Mb 512 Mb SWAP/Virtual Memory 512 Mb free 512 Mb free 512 Mb free (Minimum) RAM (Recommended) 1 Gb 1 Gb 1 Gb SWAP/Virtual Memory 1 Gb 1 Gb 1 Gb (Recommended) Memory per additional 1000 nodes 512 Mb 512 Mb 512 Mb CPU 333 MHz 333 MHz 333 MHz and above and above and above Table 2-1. NNM Advanced Edition Memory, Disk, Swap Space, CPU Use the values in Table 2-1 as a starting point for sizing your NNM Advanced Edition server. These values should be sufficient for most standalone NNM AE systems managing up to 1000 nodes with up to

Chapter 2 5 NNM Performance and Configuration Guide Management System Sizing

65000 interfaces. You will need to increase these values for larger and more complex environments. To calculate the memory requirements for an NNM Advanced Edition server, add the recommended memory for the first 1000 nodes / 65,000 interfaces to a multiple of the recommended memory for each additional 1000 nodes / 65,000 interfaces in your environment. For example, if your network contains 3500 nodes and 227,000 interfaces, the calculation would be: 1 Gb + (3 * 512 Mb) = 2.5 Gb. For a network of 4800 nodes and 310,000 interfaces, the calculation would be 1 Gb + (4 * 512 Mb) = 3 Gb.

Network Node Manager Starter Edition Network Node Manager Starter Edition involves a simpler object-to-node calculation than does Advanced Edition. Ordinarily, the object-to-node ratio in an NNM SE environment is about 2.4. See Chapter 10, NNM SE Sizing Considerations, for more information on object-to-node ratios. Table 2-2 describes minimum requirements for available memory, free disk space, swap space, and hardware. These minimum values do not include operating system requirements or the requirements of other software applications. Note that the operating systems listed in the following table refer to the operating system revisions currently supported by NNM, as documented in the NNM Release Notes.

6 Chapter 2 NNM Performance and Configuration Guide Management System Sizing

Free Work Memory Disk Swap Space Other Value Space HP-UX 150 Mb 840 Mb 256 Mb free 55 HP-UX PA- on PA- RISC RISC workstation, CD-ROM drive HP-UX 170 Mb 730 Mb 256 Mb free 55 HP-UX on Itanium workstation, CD-ROM drive Red Hat 680 Mb 450 Mb 768 Mb free 55 Linux Pentium 450 MHz, CD- ROM drive Solaris 210 Mb 730 Mb 256 Mb free 55 Sun SPARC workstation, CD-ROM drive Windows 100 Mb 550 Mb Minimum: 55 Intel 2000 (NTFS) 256 Mb free, Pentium 450 recommended MHz, CD – 512 ROM drive Table 2-2. NNM Starter Edition Memory, Disk, Swap Space, Work Value, CPU Use the values in Table 2-2 as a starting point for sizing your NNM Starter Edition server. These values should be sufficient for most standalone NNM SE systems managing up to 1000 nodes. You will need to increase these values for larger and more complex environments. The term “work value” refers to an arbitrary measure of processing power needed to perform NNM tasks. The comparative work values can assist

Chapter 2 7 NNM Performance and Configuration Guide Management System Sizing

you in determining which class of system (Small, Medium, Large, Extra- Large) is necessary to perform these tasks. See Table 2-4.

Scaling to Larger Standalone Environments You can use Table 2-3 with Table 2-2 to approximate the additional memory, free disk space, and work value needed to accommodate topologies larger than 1000 nodes. Additional per 1000 nodes Free Disk Memory Work Value Space

HP-UX on PA-RISC 30 Mb 30 Mb 30 HP-UX on Itanium 5 Mb 30 Mb 30 Red Hat Linux 60 Mb 100 Mb 30 Solaris 65 Mb 85 Mb 30 Windows 2000 30 Mb 25 Mb 30 Table 2-3. Incremental Memory, Free Disk Space, Work Values These values assume a “normal” ratio of [ networks + segments + nodes + interfaces ] / nodes of 2.4. (See Chapter 10, NNM SE Sizing Considerations, for more information on object-to-node ratios.) If your managed environment has a higher ratio, you will need to increase these values proportionately. Example 1: NNM (with normal object-to-node ratios) needs to manage 1000 nodes. How much available memory, free disk space, and hardware processing power is needed to handle this environment on an Windows 2000 system? Solution: You can determine the available memory and free disk space needed for this environment from Table 2-2—available memory (not including operating system) of 100 Mb and free disk space of 550 Mb. You can also find the hardware processing power (work value) in Table 2-2: 55. Refer to Table 2-4 for an appropriate “small”-sized hardware system.

8 Chapter 2 NNM Performance and Configuration Guide Management System Sizing

Example 2: NNM (with normal object-to-node ratios) needs to manage 3000 nodes. How much available memory, free disk space, and hardware processing power are necessary to handle this environment on an HP-UX PA-RISC system? Solution: You can determine the minimum available memory needed for 1000 nodes from Table 2-2: 150 Mb. Adding another 2000 nodes would increase this by 2 x 30 Mb, for a total of 210 Mb of available memory. The minimum free disk space, from Table 2-2, is 840 Mb. Adding another 2000 nodes would increase this by 2 x 30 MB, for a total of 900 Mb of free disk space. The hardware processing power (work value) is the minimum (55) plus the additional work value for 2000 additional nodes (2 x 30) for a total of 115. Refer to Table 2-4 for an appropriate “medium”-sized hardware system.

Chapter 2 9 NNM Performance and Configuration Guide Management System Sizing

Work Class Examples Units Small 0-80 HP xw3100 workstation · single 3.2 GHz Intel Pentium 4 processor HP rp2430 PA-RISC server · single 650 MHz PA-8700 processor HP zx2000 Intel Itanium-based workstation · single 1.40 GHz Intel Itanium 2 processor Sun Ultra 60/450MHz Medium 80-150 HP ProLiant ML330 server · dual 3.06 GHz Intel processors HP j6750 PA-RISC workstation · dual 750 MHz PA-8700 processors HP Integrity rx2600 server · dual 1.0 GHz Intel Itanium 2 processors Sun Enterprise 4500 Large 150- HP ProLiant ML570 server 400 · multiple 2.8 GHz Intel Xeon processors HP rp7410 PA-RISC server · multiple 875 MHz PA-8700+ processors HP Integrity rx7620 server · multiple 1.3 GHz Intel Itanium 2 processors Extra- 400+ HP rp8400 server Large · multiple 875 MHz PA-8700+ processors HP Superdome PA-RISC server · Multiple 875 MHz PA-8700+ processors HP Integrity Superdome server · Multiple 1.5 GHz Itanium 2 processors Table 2-4. Work Values and System Classes

NOTE Estimates provided in this document assume that the system is running basic operating system processes and Network Node Manager only. The addition of any other applications, services, et cetera, requires additional system resources not covered by these estimates.

10 Chapter 2 NNM Performance and Configuration Guide Management System Sizing

OVW Sessions Running multiple ovw sessions on the same management station will require additional memory. Table 2-5 provides approximate memory requirements per ovw session for three different sized topologies. Platform 1000 nodes 2000 nodes 5000 nodes HP-UX on PA-RISC 11 Mb 18 Mb 29 Mb HP-UX on Itanium 33 Mb 41 Mb 55 Mb Red Hat Linux 12 Mb 17 Mb 28 Mb Solaris 18 Mb 25 Mb 37 Mb Windows 2000 11 Mb 23 Mb 34 Mb Table 2-5. Approximate ovw memory requirements When an ovw session starts, synchronization is performed to ensure that the map accurately reflects the topology data. The amount of time needed for this synchronization phase increases with larger topologies. Table 2-6 lists approximate synchronization times for ovw sessions on different platforms for different topologies. Actual synchronization times will, of course, vary based on the number of changes in the environment as well as your server configuration and network topology. Platform 1000 nodes 2000 nodes 5000 nodes HP-UX on PA-RISC <5 seconds <15 seconds <120 seconds HP-UX on Itanium <5 seconds <20 seconds <120 seconds Red Hat Linux <10 seconds <40 seconds <120 seconds Solaris <10 seconds <30 seconds <350 seconds Windows 2000 <5 seconds <25 seconds <250 seconds Table 2-6. Approximate synchronization times for ovw

Chapter 2 11 NNM Performance and Configuration Guide Management System Sizing

Management Consoles To provide access to more operators without straining the management station with multiple ovw sessions, you can use management consoles. Management consoles offload the ovw process from the management station but operate off of the management station’s topology database. Testing has shown that you can have 15 to 25 management consoles simultaneously monitoring your network. Refer to A Guide to Scalability and Distribution for HP OpenView Network Node Manager for more information on management consoles.

12 Chapter 2 3 NNM AE Deployment Scenarios

Chapter 3 13 NNM Performance and Configuration Guide NNM AE Deployment Scenarios

This chapter presents example network environment scenarios to illustrate how NNM AE has been successfully deployed in customer environments. Note that NNM AE worked well in the largest tested environment, indicating that even larger scaling should be possible.

Summary of Environments

Physical Total Scenarios Nodes Interfaces Connections VLANs Meshes Objects Site A 374 1098 705 4 22 2203 Site B 105 2860 107 0 8 3132 Site C 100 2888 98 49 15 3423 Site D 808 9717 312 47 2 12036 Site E 1314 16440 691 32 66 19713 Site F 700 31715 676 121 64 35031 Site G 685 30338 662 122 81 33424 Site H 1009 55023 790 493 3 58993 Site J 979 54524 701 593 4 58503 Site K 685 62755 787 152 104 65049 Site L 508 24536 312 279 1 26193 Site M 456 15431 810 166 77 17654 Site N 3595 114337 7225 2270 1260 131054

Site Details Site A Although this environment was relatively simple, it had a high number of devices. CPU Speed/Number of CPUs single CPU Amount of Memory 1 Gb Amount of Swap Space 6 Gb Amount of Disk Space 50 Gb Machine Model SunBlade 150 Operating System Platform Solaris 2.8 + all patches Number of Discovery Zones 1 Total Discovery Time 1 hour

14 Chapter 3 NNM Performance and Configuration Guide NNM AE Deployment Scenarios

Site B The NNM server at this site was a Sun Solaris system. Site C This environment had a very high number of HSRP groups. CPU Speed/Number of CPUs single CPU Amount of Memory 500 Mb Amount of Swap Space 700 Mb Amount of Disk Space 7 Gb Machine Model SunOS sun4u SPARC SUNW, Ultra-2 Operating System Platform Solaris 2.8 Generic_108528-17 Number of Discovery Zones 1 Total Discovery Time 18 minutes 30 seconds Site D This environment had a very low interface count per device (due to the fairly high number of managed end nodes) and many HSRP groups. CPU Speed/Number of CPUs 750 MHz / 4-way Amount of Memory 6 Gb Amount of Swap Space 7 Gb Amount of Disk Space 110 Gb Machine Model Sun4u, SunFire 3800 Operating System Platform Solaris 2.8 Number of Discovery Zones 1 Total Discovery Time 1 hour 9 minutes 48 seconds

Chapter 3 15 NNM Performance and Configuration Guide NNM AE Deployment Scenarios

Site E This environment had exclusively Cisco equipment. CPU Speed/Number of CPUs 240Mhz / Single CPU Amount of Memory 2 Gb Amount of Swap Space 3 Gb Amount of Disk Space 12 Gb Machine Model HP 9000/800 D390 Operating System Platform HPUX 11.00 Number of Discovery Zones 4 Total Discovery Time 4 hours 25 minutes 28 seconds Site F The majority of devices in this reasonably large environment were Cisco and HP ProCurve. CPU Speed/Number of CPUs 240 MHz / 4-way Amount of Memory 3843 Mb Amount of Swap Space 1 Gb Amount of Disk Space 20 Gb Machine Model HP 9000/800/K580 Operating System Platform HPUX 11.0 Number of Discovery Zones 10 Total Discovery Time 5 hours 30 minutes 9 seconds Site G CPU Speed/Number of CPUs 240 MHz / 4-way Amount of Memory 3843 Mb Amount of Swap Space 1 Gb Amount of Disk Space 20 Gb Machine Model HP 9000/800/K580 Operating System Platform HPUX 11.0 Number of Discovery Zones 10 Total Discovery Time 4 hours 28 minutes

16 Chapter 3 NNM Performance and Configuration Guide NNM AE Deployment Scenarios

Site H This large-scale environment had a very large number of Extreme devices. A single discovery zone was not effective in this environment— system resources such as memory and swap space were not sufficient to complete a single-zone discovery. The use of multiple discovery zones in this environment (a smallish NNM AE server monitoring a large-scale network) clearly illustrates the benefits of zone discovery for less powerful NNM AE servers. CPU Speed/Number of CPUs 440 MHz / 2-way Amount of Memory 1 Gb Amount of Swap Space 2 Gb Amount of Disk Space 10 Gb Machine Model HP 9000/800/L2000-44 Operating System Platform HPUX 11.11 Number of Discovery Zones 20 Total Discovery Time 5 hours 50 minutes Site J This environment had a complex network with large number of VLANs. CPU Speed/Number of CPUs 440 MHz / 2-way Amount of Memory 1 Gb Amount of Swap Space 2 Gb Amount of Disk Space 10 Gb Machine Model HP 9000/800/L2000-44 Operating System Platform HPUX 11.11 Number of Discovery Zones 20 Total Discovery Time 5 hours 50 minutes

Chapter 3 17 NNM Performance and Configuration Guide NNM AE Deployment Scenarios

Site K This environment was an excellent example of a configuration where single-zone discovery worked well. The network was very flat, with many switches and only one router. CPU Speed/Number of CPUs 900 MHz / 2-way Amount of Memory 2 Gb Amount of Swap Space 4 Gb Amount of Disk Space 18 Gb (mirrored) Machine Model SunFire 280R Operating System Platform Solaris 2.8 Number of Discovery Zones 1 Total Discovery Time 3 hours 44 minutes 29 seconds Site L This was a very large environment consisting of Cisco and Extreme devices and a large number of VLANs. CPU Speed/Number of CPUs 400 MHz / 4-CPU Amount of Memory 2.25 Gb Amount of Swap Space 6.4 Gb Amount of Disk Space 25 Gb Machine Model ProLiant Operating System Platform Windows 2000 Server Number of Discovery Zones 8 Total Discovery Time 4 hours 3 minutes 57 seconds

18 Chapter 3 NNM Performance and Configuration Guide NNM AE Deployment Scenarios

Site M This environment had a large number of VLANs and mostly Cisco and Nortel routes and switches. CPU Speed/Number of CPUs 1.4 GHz / 2 CPU Amount of Memory 1.3 Gb Amount of Swap Space 4.4 Gb Amount of Disk Space 40 GB Machine Model Compaq ProLiant DL360 G2 Operating System Platform Windows 2000 Server Number of Discovery Zones 15 Total Discovery Time 1 hours 1 minutes 43 seconds Site N This environment had a large number of Cisco devices and a large number of VLANs and meshes. Two different servers were tested in this environment, an 8-CPU, 549 MHz HP ProLiant server, and a 2-CPU, 2.8 GHz HP ProLiant DL380 G3 server. The comparison between discovery times on these two servers clearly illustrates the performance improvement possible with a faster machine. Server 1: CPU Speed/Number of CPUs 549 MHz / 8 CPU Amount of Memory 3 Gb Amount of Swap Space 9.6 Gb Amount of Disk Space 60 Gb Machine Model Compaq ProLiant Operating System Platform Windows 2000 Server Number of Discovery Zones 44 Total Discovery Time 38 hours 3 minutes 38 seconds

Chapter 3 19 NNM Performance and Configuration Guide NNM AE Deployment Scenarios

Server 2: CPU Speed/Number of CPUs 2.8 GHz / 2 CPU Amount of Memory 1.6 Gb Amount of Swap Space 5 Gb Amount of Disk Space 60 Gb Machine Model ProLiant DL380 G3 Operating System Platform Windows 2000 Server Number of Discovery Zones 67 Total Discovery Time 19 hours 36 minutes 8 seconds

20 Chapter 3 4 Managing Large Networks with NNM Advanced Edition

Chapter 4 21 NNM Performance and Configuration Guide Managing Large Networks with NNM Advanced Edition

The Extended Topology capability of NNM AE is very CPU- and memory- intensive during the discovery and modeling phases. Tuning may be helpful when deploying NNM AE Extended Topology to manage fairly large networks. Tuning might involve the use of multiple discovery zones, the Exclusion List filtering feature (using the bridge.noDiscover file) or other techniques. The Automatic Zone Configuration feature can be used to automatically create zones that optimize performance for the specific management station and network environment. See later sections for more information on zone discovery, automatic zone configuration, and the use of the Exclusion List feature. Also, increasing system memory on the NNM AE Extended Topology server can improve NNM AE Extended Topology speed, especially during discovery. This chapter presents ideas and techniques that have proven valuable in improving the scale and performance of NNM AE Extended Topology. There are three main approaches: · Target the critical or most important devices for management. · Use the zone discovery feature to minimize resource consumption during the discovery process. · Use the incremental discovery feature to quickly discover one portion of the environment at a time.

Targeting Your Critical Devices Focus your network management on the critical or most important devices, such as switches and routers. That is, target the backbone of the network first. Generally, you should avoid managing end nodes (for example, personal computers or printers) unless the end node is identified as a critical resource. For example, database and application servers are usually considered as critical resources. The main tool available for targeted management is the Exclusion List feature. Its use is discussed below. NNM also has an auto-unmanage feature, which will unmanage certain interfaces in your topology based on user-defined filtering criteria. This can be very helpful in limiting what is managed, discovered, and polled. You can find more information about

22 Chapter 4 NNM Performance and Configuration Guide Managing Large Networks with NNM Advanced Edition

this feature in the ovautoifmgr Reference Pages on Windows or by using the Unix commands “man ovautoifmgr” and “man 4 ovautoifmgr.conf”. The following suggestions can help you improve the performance of discovery and topology data processing. · If you use the netmon poller for status polling and you do not require meshing data, disable Mesh Analysis for netmon: “ovstop –c ovet_pathengine”. · If you use the netmon poller for status polling, stop netmon during NNM AE Extended Topology discovery process: ovstop –c netmon. Note that you must ensure that the netmon process has polled all managed nodes at least once before NNM AE Extended Topology is even enabled. · Increasing physical memory (RAM) can substantially improve the performance of the discovery process. · Increasing swap space can improve performance. There is, however, a point after which increasing swap space longer helps. Make sure you have at least the minimum swap space recommended for your environment (see Chapter 2, Management System Sizing, for details). You might also experiment with increasing swap space to determine if performance improvement results. Using the Exclusion List The NNM AE Extended Topology Exclusion List feature is used to specify a list of nodes that should not be discovered. It is enabled by the creation of the file, $OV_CONF/nnmet/bridge.noDiscover. The file can contain the list of nodes to be excluded or a list of filters. If filters are used, steps 3-6 listed below will be replaced by editing the bridge.noDiscover to add filter information. You can find more information on filters in the bridge.noDiscover Reference Page on Windows or by using the Unix command “man bridge.noDiscover”. Note that the use of filters can increase processing times that can partially offset the performance savings of not discovering the filtered nodes.

Chapter 4 23 NNM Performance and Configuration Guide Managing Large Networks with NNM Advanced Edition

The following is an example series of commands showing how you might use an Exclusion List to trim the use of NNM AE Extended Topology from one thousand nodes down to four hundred. 1. $OV_BIN/setupExtTopo.ovpl or $OV_BIN/etrestart.ovpl: These commands start NNM AE Extended Topology. If NNM AE Extended Topology is not enabled, enable it by running setupExtTopo.ovpl. If NNM AE Extended Topology has already been enabled, restart it by running etrestart.ovpl. In our example, NNM AE Extended Topology will initially discover 1000 nodes. 2. ovstop –c ovet_auth: This command stops ovet_auth, which in turn will stop NNM AE Extended Topology. 3. wc –l $OV_DB/nnmet/hosts.nnm: This command returns the total number of lines in the hosts.nnm file. In this example, it should return “1000”. 4. cp $OV_DB/nnmet/hosts.nnm $OV_CONF/nnmet/bridge.noDiscover: This command copies the hosts.nnm file to bridge.noDiscover. 5. Edit bridge.noDiscover. The bridge.noDiscover file has both IP addresses and names. Delete the lines specifying nodes that need to be discovered (400 in this example). Leave in the lines specifying the nodes to be ignored, as this is a “no discover” file. 6. cut –f 1 bridge.noDiscover >tmpfile ; mv tmpfile bridge.noDiscover: The bridge.noDiscover file only takes IP addresses. The hosts.nnm file includes node names, which need to be removed. You can either edit the bridge.nodiscover file can be edited and place a “#” sign between each IP address and its name, or run a cut command to fetch just the IP address and move the results back to bridge.noDiscover. Note that on the Windows platform, the cut command is only supported if you have a UNIX emulation toolset. If you don’t have such a toolset, you’ll have to edit the file manually. 7. $OV_BIN/etrestart.ovpl: Restart NNM AE Extended Topology so that it uses the updated bridge.noDiscover file. You can also

24 Chapter 4 NNM Performance and Configuration Guide Managing Large Networks with NNM Advanced Edition

restart Extended Topology via the Extended Topology Configuration GUI with the “Initiate Full Discovery” button. 8. wc –l $OV_DB/nnmet/hosts.nnm: After some time, run the above command to ensure that the new hosts.nnm file is reduced in size (to 400 in our example).

Zone Discovery By default, NNM AE Extended Topology discovers Layer 2 information from all nodes and devices being managed by NNM. The discovery process needs to complete before network modeling and views are available for use. The discovery and modeling process are very CPU- intensive with a very large memory footprint and can stretch system resource to peak usages in larger environments Because of this, the Zone Discovery feature is particularly useful in larger environments where use of single zone discovery pushes system resources to the limits. Zone discovery divides a network into a number of zones, discovering each zone independently and merging them together through connections at the edges of the zones, represented by nodes in multiple zones. Zones can be created with the Automatic Zone Configuration or by hand. Zone configuration can be edited, even for zones that were created automatically. Zones support the scalability of Extended Topology discovery. Note that zone-based discovery only helps at the discovery phase. The benefits of using zones are especially magnified in large-scale environments with relatively small deployment machine since fewer system resources are consumed during the discovery process. The Automatic Zone Configuration feature accounts for such factors and can configure your zones for you automatically. Note that the setupExtTopo.ovpl script will alert you if it determines that zones would be beneficial, and will allow you to automatically configure zones at that time. If setupExtTopo.ovpl does not recommend configuring zones, you are likely to have good performance without configuring zones.

Chapter 4 25 NNM Performance and Configuration Guide Managing Large Networks with NNM Advanced Edition

Zones are displayed and can be defined or edited manually using the NNM AE Extended Topology Configuration GUI. See the “Using NNM AE Extended Topology” manual for details. Using more zones will by its nature result in discovery time improvements. In large environments, this improvement, and the reduced system resource usage during discovery, will be significant. Automatic Zone Configuration Automatic Zone Configuration (“AutoZone”) removes zone configuration complexity for the user. Zone sizes are defined by the number of managed nodes they contain plus the number of all interfaces on those nodes. AutoZone uses this concept to produce zones that may contain fewer complex devices, or a greater number of simple devices (such as end nodes), so that the processing burden is fairly evenly distributed across all zones. The “Test All Zones” feature will verify that a zone configuration (either created automatically or by hand) would result in an efficient, accurate, and successful discovery. On occasion, AutoZone and “Test All Zones” feedback may suggest that the user edit one or more zones. Because of this, it’s good to understand how to create well-defined zones, though users will usually not need to create them by hand. Zone Configuration Considerations Considerations for creating well-defined zones include: · Consider dividing network devices into groups using geography (e.g. by city or site or building etc.). · Switches that are connected together within a subnet should not be separated into different zones, if possible. If they must be separated due to a very large subnet, take care to have overlap across zones for any split that crosses a direct physical connection. For example, let’s say that switch X and switch Y are directly connected and are in separate zones, zone 1 and zone 2. In this case, make sure that at least zone 1 or zone 2 has both the switches. That is, zone 1 would have switch X and Y and zone 2 would have switch X, or zone 1 would have switch Y and zone 2 would have X and Y. · Smaller subnets can be combined to form a zone.

26 Chapter 4 NNM Performance and Configuration Guide Managing Large Networks with NNM Advanced Edition

· If two subnets share a common router, and the subnets are in different zones, place the common router in both zones. · The maximum size of a zone, as created by the zone configuration tool and as documented for manually created zones, should be somewhere between 4000 and 6000 objects (nodes + interfaces). For further information, see the “Using NNM AE Extended Topology” manual, in the NNM AE Extended Topology Discovery chapter under the section “Running NNM AE Extended Topology Discovery”.

Incremental Discovery In a large environment, a full discovery of the entire network can take several hours. The Incremental Discovery feature, or per-zone discovery, allows users to update topology information quickly by discovering just the portion (zone) of the network that they are interested in, rather than having to rediscover the entire network all at once. Incremental discovery of a particular zone is invoked via the NNM AE Extended Topology Configuration GUI, and provides a fast way to rediscover a portion of the network. This can be especially useful when adding a new zone containing new devices.

DNS Performance Issues DNS lookups are blocking, and can cause significant slowdown. One solution is to run a caching-only secondary DNS server on the NNM server. Additionally, slowdowns caused by DNS lookup failures (reverse lookups) can be reduced with the ipNoLookup.conf file. NNM maintains a no- lookup cache that is used to prevent forward name lookups for names that do not resolve to IP addresses. The entries in the no lookup cache can be controlled using the $OV_BIN/snmpnolookupconf command: snmpnolookupconf [-add | -load | -disable] .

Chapter 4 27 NNM Performance and Configuration Guide Managing Large Networks with NNM Advanced Edition

28 Chapter 4 5 Polling Considerations

Chapter 5 29 NNM Performance and Configuration Guide Polling Considerations

This chapter describes the types of polling performed by NNM’s netmon poller and the load on the network associated with each polling type. The netmon poller is the only poller available with NNM Starter Edition and the default poller in NNM Advanced Edition.

Device Status Polling The time required to update the status of an IP interface is generally equal to the round-trip time of an ICMP echo request to that interface. However, it may be several times the round-trip time because a few of the pings (or the responses) may be dropped. You can use the Options ® SNMP Configuration menu item to individually configure the initial status polling interval timeout and the number of retries for an interface. Each IP status poll generates about 4-5 small packets on the local LAN (due to ARPs) and two packets on the remote LANs. IPX (available on Windows only) generates a similar number of packets per poll. The default status-polling interval is 15 minutes (900 seconds). With a 300 node network where 250 of the nodes have only one interface per node and 50 nodes have three interfaces per node, you will poll one interface every 2.25 seconds and generate only 5 to 7 packets per second. (The 4-5 packets for each IP interface depend on how fast the ARP cache entries on the local system time-out. If the polling cycle time is less than the ARP cache event time-out, then it is just about two packets per second on the local station). As you increase the size of the network and/or decrease the status-polling interval, the number of packets generated per second increases proportionally. If the status-polling interval is too short for the number of interfaces managed, the amount of time needed for every ICMP echo request and IPX diagnostic request to reach its targeted managed interface and return is insufficient. This would be like asking an interviewer to complete 100 5-minute phone calls in 8 hours—it can't be done. The round trip time of each ICMP echo request multiplied by the number of managed interfaces can exceed the status-polling interval, just as the length of each phone call multiplied by the number of phone calls exceeds the total eight-hour period given to the interviewer in our example.

30 Chapter 5 NNM Performance and Configuration Guide Polling Considerations

A high incidence of paired node down / node up events is usually an indication of a busy LAN with either an excessively short polling cycle or inadequate timeouts. Also, nodes that are in a node down status will slow the polling process due to the amount of time spent waiting for timeouts to occur. The time-out time doubles with each successive retry, causing the netmon poller to delay a significant amount of time when encountering a down node. To determine the packets per second (PPS) generated for a status poll, use the following formula: ( NumberOfInterfaces x 5 ) PPS = Polling Interval

The default polling interval of 300 seconds (5 minutes) should meet most customer needs. Reducing the polling interval into the range of a few seconds shows how fast a down device can be detected, but it can also cause problems (especially with larger managed networks).

Other Polling Types In addition to status polling, there are other types of polling that cause network traffic: Device Discovery Polling · IP discovery polling requires 10 to 100 packets per node, depending on the type of node. Gathering ARP cache and routing table entries requires an additional two packets per entry, and routers may have hundreds or thousands of entries. Devices that don't support SNMP take about three packets. The frequency for updates on particular nodes varies depending on how often new nodes are found and the amount of information from those particular node (from one minute to every 24 hours). · IPX discovery polling (available on Windows only) is accomplished via a different mechanism than IP polling. Various IPX requests, including some broadcast requests, are sent out to collect information on devices that support IPX. In typical IPX networks, three to four

Chapter 5 31 NNM Performance and Configuration Guide Polling Considerations

packets are generated per IPX node during the IPX discovery poll. The IPX discovery poll interval is, by default, every six hours. Topology Configuration Polling The number of packets generated for this poll depends greatly on how many interfaces in your network communicate through devices that support the Bridge MIB (RFC 1493). In a highly bridged or switched environment, expect 10 to 20 packets to be generated per interface in the network. The default configuration-polling interval is every four hours. Device Configuration Check Polling During configuration check polls, 20 to 100 packets per node are generated, depending on the size of the information requested. The default is to complete this poll once per day for each node. Collection Station Status Polling The number of packets generated for this poll is approximately 20 packets per collection station. The frequency of this type of polling is configured on a per collection station basis. The default frequency is every five minutes. Other Applications Extra network traffic is generated if other applications (such as CiscoWorks LAN Management Solution or other third-party element management tools) are used to collect data, monitor MIB variables, check thresholds, create graphs, or connect to remote systems.

32 Chapter 5 NNM Performance and Configuration Guide Polling Considerations

Device Discovery Polling Times The following table lists approximate times for initial device discovery of 1000 network nodes. Actual discovery times will, of course, vary based on server configuration and network topology. Platform DNS enabled DNS disabled HP-UX on PA-RISC <1 minute <1 minute HP-UX on Itanium <1.5 minutes <1 minute Red Hat Linux <1.5 minutes <1 minute Solaris <1.5 minutes <1 minute Windows (being testing) (being testing) Table 5-1. Approximate device discovery times Although in this somewhat limited test case, discovery times did not vary significantly when DNS was enabled or disabled. In larger environments, this can have a significant impact. Customers are encouraged use the ipNoLookup.conf file to offload the DNS lookup responsibilities handled by netmon.

Polling Limits in Large Environments: Test Cases and Results Considerations in large-scale NNM environments include understanding the limitations for polling large numbers of nodes and interfaces. Table 5-2 is the result of tests run in NNM production environments. It shows the working limits for several different HP-UX systems in large polling environments. The “polling limit” was considered reached when the netmon polling process started falling significantly behind.

Chapter 5 33 NNM Performance and Configuration Guide Polling Considerations

System Class Nodes Interfaces Objects HP 9000 C360 – 768MB – 6000 30000 45000 367MHz – Single Processor HP 9000 C3000 – 768MB 10000 40000 60000 – 400MHz – Single Processor HP 9000 C3600 – 1GB – 15000 40000 60000 550MHz – Single Processor HP 9000 C3700 – 2GB – 15000 40000 60000 750MHz – Single Processor HP 9000 J2240 – 2GB – 10000 40000 60000 240MHz – Dual Processor – 4 striped disks/9GB each HP 9000 J5600 – 4GB – 30000 70000 105000 550MHz – Dual Processor – 2 striped disks/9GB each – Gigabit Fiber HP 9000 L2000 – 4GB – 25000 60000 85000 440MHz – Four Processor – 4 striped disks/9GB each HP 9000 N4000 – 4GB – 30000 70000 105000 440MHz – Eight Processor – 4 striped disks/9GB each – Gigabit Fiber Table 5-2. Polling Limits

34 Chapter 5 NNM Performance and Configuration Guide Polling Considerations

This was an out-of-the-box test, and all the NNM defaults were used. Significant default values included: NNM Configuration Parameter Default Value Status Polling Interval 15 minutes, except routers (3 minutes, 1 minute primary), switches (5 minutes, 1.5 minutes primary), and hubs (7.5 minutes, 7.5 minutes primary) Configuration Polling Interval 24 hours ICMP Queue Length 20 SNMP Queue Length 20 Submaps NOT On-Demand Table 5-3. NNM Configuration Defaults Used This test suite did not test DNS lookup, as DNS lookup performance can vary significantly in different customer environments. For these tests, an ipNoLookup.conf file containing a fully wildcarded IP address was added to the $OV_CONF directory of all test systems. This was to prevent netmon from doing any hostname resolution. Discovery was also turned off, in order to determine the scalability of NNM after a steady state had been reached.

Event Processing Performance Available memory will affect NNM event processing performance. Also, replacing the native SNMP agent with the EMANATE SNMP agent can improve event processing performance. For example, two Windows systems were configured to send and receive SNMP events. The sender generated 45000 events (traps), 500 events at a time, with 30 seconds between each batch of 500. The amount of RAM had a significant effect on the event processing rate (and thereby the percentage of dropped events). On the memory-limited system, event processing performance was noticeably improved through

Chapter 5 35 NNM Performance and Configuration Guide Polling Considerations

use of the EMANATE SNMP agent and an increase in the size of the UDP socket buffer. 532 Mb RAM 1 Gb RAM Default ovtrapd 8.5 events/sec 49 events/sec configuration 55% drop rate 0.12% drop rate EMANATE SNMP 13.3 events/sec 49 events/sec agent with 250 Kb 37% drop rate 0.13% drop rate UDP socket buffer size Table 5-4. Event Processing and Drop Rates Note that use of the EMANATE SNMP agent and increased UDP socket buffer size can also improve communication performance between a management station and a collection station. See Chapter 7, NNM Application Tuning, for information on configuring ovtrapd to take advantage of the EMANATE SNMP agent and to increase the UDP socket buffer size. See Chapter 10, NNM SE Sizing Considerations, for more information on management and collection stations.

Event Reduction For the scalability measurements of a standalone system, a high performance test system was used. This was an 8-processor HP 9000 N4000 440MHz with HP-UX 11.11 and 4Gb RAM. Two optimization techniques were employed: the kernel parameter dbc_max_pct was lowered to 10%, and the database directory was symbolically linked to a set of 4 striped external disks. The main objective of the standalone testing was to characterize event processing performance. For this characterization, event databases were collected from 4 customer sites to analyze statistics on event reduction and processing rates. While these real databases included a representative distribution and variety of event types, the rate at which they were injected into the test system was unrealistically high. This high event rate helped characterize the maximum processing rate, but may have created some pathological conditions.

36 Chapter 5 NNM Performance and Configuration Guide Polling Considerations

Briefly, the customers from whom the event databases were collected included: · Customer A: a multi-national computer manufacturer with Extreme and Cisco switches and routers in a routed network and meshed router core. The customer’s network was comprised of approximately 1000 nodes, 55,000 interfaces, and 600 VLANs. · Customer B: a multi-national computer manufacturer with primarily Cisco routers. The customer’s network consisted of 700 nodes, 30,000 interfaces, and 120 VLANs. · Customer C: a large financial institution. The customer network contained only Cisco routers. · Customer D: a state government organization. This customer’s network was also a pure Cisco, routed network with 800 nodes, 10,000 interfaces, and 50 VLANs. For each of these customer databases, the events were injected directly into a running pmd using the internal tool feedPmd. A typical invocation was feedPmd –n –l 65 –d eventdb. This invocation suppressed the internal action events (-n) and sent the events at 65 per second from the data in the directory eventdb. Event rates were measured by monitoring the log file $OV_LOG/pmd.log with the pmd option set in pmd.lrf as follows: Ovs_YES_START::-SOV_EVENT;t;s1000:Ovs_WELL_BEHAVED:30:PAUSE To characterize the makeup of these event databases, the events were converted to textual form with ovdumpevents –s default. The events were collected, and the different event types were counted with $OV_SUPPORT/processEvents. In order to not overwhelm the display of events in the event browser or hide important events in a sea of less important events, a number of filtering techniques were employed. · Events were classified by their OID. Selected types were categorized as “logonly” and were not displayed. · ECS circuits within pmd process events, and only the root cause of correlated events, were displayed.

Chapter 5 37 NNM Performance and Configuration Guide Polling Considerations

· New ECS Composer circuits with de-duplication and pattern delete processes were used. To measure the effectiveness of the various filters, test databases were injected with various configurations, the results were displayed in the event browser and the total count of displayed events were recorded.

¨ Raw: a count of events sent by feedPmd ¨ Less logonly: a set of scripts counted the number of events left over after the special “logonly” events were subtracted ¨ NNM 6.2: for historical comparison ¨ ECS (NNM 6.31): the number of events reported by an NNM 6.31 system. This measures the effectiveness of the ECS circuits. ¨ Dedup (NNM 6.4x): with the addition of the ECS Composer circuit plus the de-dup and pattern delete operations Displayed Customer Customer Customer Customer events A B C D Raw 27430 37375 57785 37895 Less logonly 5916 12107 57574 27693 NNM 6.2 377 >3500 >3500 3446 ECS (NNM 613 >3500 3422 >3500 6.31) De-dup 119 247 3421 2035 (NNM 6.4x) Customized 778 Table 5-5. Event Reduction Results from Standalone Polling There is a fixed limit of 3500 events that can be displayed in the browser at any given time. For datasets B and D on NNM 6.31, this limit was reached. These data show that there have been significant improvements over NNM 6.31 in some cases (5:1 for A, >14:1 for B) and little change (customer C). The increase from 6.2 to 6.31 for customer A is because additional OV_IF_Down events were not correlated to node down events.

38 Chapter 5 NNM Performance and Configuration Guide Polling Considerations

Event Rate To measure the event-processing rate, feedPmd ran with the 4 datasets. The feedPmd program prints a status line every time a block of events have been sent out (nominally 1 second), but if the processing rate is less than the feedPmd rate, the status lines pause until buffers have been emptied. The feedPmd rate was increased until pmd was saturated, then the pmd log file report of “Event in” and “Event out” rates was recorded. Those trends are charted below for the 4 data sets.

Event in rate

100

90

80

70

60

A B 50 C D events/second 40

30

20

10

0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126

Chapter 5 39 NNM Performance and Configuration Guide Polling Considerations

Two consecutive runs of Customer D’s data were performed to see if the gradual decline of the event rate would continue. Indeed, the next graph shows a complete recovery if the ECS configuration GUI is used to disable, then re-enable, each ECS circuit between runs:

2 runs of Customer D data with ECS reset

90

80

70

60

50

Events/s 40

30

20

10

0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 kEvents

40 Chapter 5 NNM Performance and Configuration Guide Polling Considerations

With ECS turned off (ecsmgr –reset), the event rate fluctuates rapidly, but is significantly higher than with ECS on:

Event in rate

100

90

80

70

60

A B 50 C D events/second 40

30

20

10

0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126

When the event rates are this high, downstream processes such as ovalarmsrv can become saturated. The rates were also measured with ECS off and deduplication also off (by uncommenting the line “#DEDUPLICATION=OFF” in $OV_CONF/ dedup.conf) with essentially the same results as above.

Chapter 5 41 NNM Performance and Configuration Guide Polling Considerations

To see if the gradual slowdown is due to the new Composer circuit, rates were measured with all default settings except that the Composer circuit was disabled. The event rates are somewhat faster, but the slowdown didn’t completely disappear:

Event-in rate (Composer off)

160

140

120

100 A B 80 C

events/sec D 60

40

20

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 kEvents

Even better performance might be achieved by shrinking the eventdb size.

42 Chapter 5 6 System Tuning

Chapter 6 43 NNM Performance and Configuration Guide System Tuning

This chapter describes some recommendations and tuning options on HP- UX systems to improve the overall NNM performance.

NNM Advanced Edition System Tuning The following tables list recommended values for Unix kernel parameters in order to tune system performance for HP OpenView Network Node Manager Advanced Edition. Kernel Parameter Recommended Value Max_thread_proc 1024 Nfile 8192 Maxdsiz 1073741824 (1 GB) Maxfiles 2048 maxfiles_lim 2048 Maxusers 256 Ncallout 6000 Nkthread 6000 Nproc 2068 Shmmni 500 Shmseg 250 Semmns 300 Semmnu 250 Semume 250 Table 6-1. HP-UX Recommended Kernel Parameter Values

Kernel Parameter Recommended Value shmsys:shminfo_shmmni 500 shmsys:shminfo_shmseg 250 semsys:seminfo_semmns 300 semsys:seminfo_semmnu 250 semsys:seminfo_semume 250 Table 6-2. Solaris Recommended Kernel Parameter Values For Windows, the following configuration changes are recommended: · Adjust processor scheduling, if configurable, for best performance of background processes.

44 Chapter 4 NNM Performance and Configuration Guide System Tuning

· Adjust memory usage, if configurable, for best performance of system cache. · Configure the initial paging size to be a minimum of 1024 Mb in the system’s virtual memory configuration.

Disks For the performance testing environment used to generate the results outlined in this guide, most of the HP-UX workstation systems used as collection stations had only a single root disk. The HP-UX servers and some of the higher-end HP-UX workstations had either 2, 3, or 4 disk drives lvm-striped together to improve I/O performance. Similar lvm striping should be considered a must to maintain a good I/O performing system. Other kernel tuning parameters that should be considered, and the associated risks evaluated, are default_disk_ir and fs_async.

NNM Starter Edition Memory Requirements · Entry-level Management Server: On HP-UX, a minimum of 768MB of RAM with 1GB is recommended. A kernel tune on 768MB or 1GB systems that can help conserve RAM is to lower the kernel default (50%) Disk Buffer Caching (dbc_max_pct) to 10%. · Mid-Range Management Server: 2GB minimum, but 4GB is recommended. · Large Management Server: 4GB minimum, but 8GB is recommended.

Chapter 6 HP Channel Use Only 45 NNM Performance and Configuration Guide System Tuning

46 Chapter 4 7 NNM Application Tuning

Chapter 7 47 NNM Performance and Configuration Guide NNM Application Tuning

This chapter describes tips and techniques for tuning NNM to improve performance or to reduce consumption of memory, swap/paging file space, disk space and processing power. The conservation tips can be used to allow you to configure more operators to manage more nodes.

The On-Demand Submap Feature This feature allows you to specify, for a specific map, the level in the submap hierarchy at which submaps remain in memory throughout the life of an ovw session. Submaps below the level specified are created from topology data whenever you open them from an ovw session. As seen in previous sections, the use of the on-demand submap feature may greatly reduce the amount of memory required to run ovw sessions. The memory savings do not come without a cost. When opening submaps that are transient (not in memory), some extra processor work will be expended and some delay may be noticeable. Unless the submap being opened contains a large number of objects (several hundred), the delay incurred and the processor work expended is negligible. If this delay becomes unacceptable, you have the option to configure the specific submap to be persistent. This includes the submap in the set of submaps that is to be stored in memory during the full life of the ovw session. Note that some third-party applications require their objects, or all objects, to be resident in memory at all times. You may set your demand level, or use a persistence filter with your demand level, to accomplish this. The ovwdb -n Option The best performance for various operations occurs when ovw has all objects loaded in memory. However, if there are too many objects and not enough physical memory to hold them, excessive swapping/paging occurs and negates the performance benefits of having all objects loaded in memory. For this reason, a limit can be placed on the ovwdb object cache to enforce an upper bound on the amount of memory used for object caching. The number of objects ovwdb stores in memory can be limited through the use of ovwdb’s -n option. Modify $OV_LRF/ovwdb.lrf to start

48 Chapter 7 NNM Performance and Configuration Guide NNM Application Tuning

ovwdb with the -n option. If -n is specified, the default number of objects stored in memory will be 5000. This value can be modified; however, the limit should not be set too low either. This could lead to performance problems due to excessive disk activity. Change the start-up configuration of ovwdb by editing the $OV_LRF/ ovwdb.lrf file, then executing the command: UNIX: $OV_BIN/ovaddobj $OV_LRF/ovwdb.lrf Windows: %OV_BIN%/ovaddobj %OV_LRF%/ovwdb.lrf The ovwdb services must be restarted for the new configuration to take effect.

Filtering In many configurations, memory, swap/paging file space, processor work, network bandwidth, and disk space are wasted on devices in which the user has no interest. Three types of filters (discovery, topology, and map) help limit the set of devices on which the NNM product operates. Discovery Filters Discovery filters define the set of devices that NNM discovers and actively monitors. These filters remove unwanted devices before they actually enter the network topology. Use of filters allows savings throughout the system, including the following: · Reduced memory and disk space requirements. The system stores fewer objects in the topology, object, and map databases. · Reduced processor work requirements. NNM actively monitors fewer devices. · Network bandwidth savings. If you apply a discovery filter at collection stations, you save network bandwidth because the amount of information flowing from the collection station to the management station is reduced. You may also use the netmon.noDiscover file as documented in the netmon online reference for NNM (man page on UNIX). Whereas a discovery filter uses several packets to discover a node and filters the

Chapter 7 49 NNM Performance and Configuration Guide NNM Application Tuning

information out, the IP addresses listed in a netmon.noDiscover file are completely excluded. Topology Filters Topology filters define the set of objects that collection stations export to management stations. From the management station’s perspective, the application of topology filters at the collection stations has a similar effect to that of discovery filters applied at the management station. Fewer objects flowing throughout the system creates a reduction in resource requirements. From the collection station’s perspective, a processor work hit is incurred due to the fact that objects are checked against the filter whenever they are changed. This is offset when a significant number of objects are filtered out and no information about these objects need be sent to the management station. Map Filters Map filters define the set of objects that will be included in a map and subsequently displayed via an ovw session. By reducing the size of maps, disk space used by the maps and memory usage by the ovw session processes is reduced. If a map filter is configured for a map that reduces the number of nodes by 1000, a savings of approximately 5MB of memory is gained in a single ovw session. A savings of approximately 1.5MB of disk space could also be gained in the map database.

Reclaiming Memory by Restarting Some extra memory is consumed in the course of initial discovery the first time that NNM is deployed in your network environment. When initial discovery is complete, it may help to stop and restart NNM to reclaim this extra memory. To restart NNM: 9. Exit all ovw sessions. 10. Execute ovstop. 11. Execute ovstart.

50 Chapter 7 NNM Performance and Configuration Guide NNM Application Tuning

12. Restart your ovw sessions.

Management Consoles Management consoles are used to offload the effort being performed by the management server. This allows for more ovw sessions by reducing the consumption of CPU cycles and system memory. Through this manual process, the ovw load is distributed to one or more management consoles. Benefits As stated above, more processing power is available for ovw clients. No additional CPU resources are reduced or shared on the management server. System memory is also conserved on the management server, as the memory resources are used on the management console client system instead of the management server. Additional connectivity may be gained by the extension of resources on a management console, instead of needing to subtract the overhead of the processor and memory needed by the other operating requirements of the management server. Possible Limitations There are two scenarios that may cause a loss of performance moving from the native (local) ovw sessions on the management server versus the use of non-native (remote) access between the management server and a management console (client). Instead of accessing the data locally within the management server system, you now have to consider the extra time, overhead, and latency of transmitting the data across a local LAN (on the same network or subnet for a nearby system), or any additional latency added by accessing the data from some remote off-site location via a WAN. Refer to A Guide to Scalability and Distribution for HP OpenView Network Node Manager for more information on management consoles.

Chapter 7 51 NNM Performance and Configuration Guide NNM Application Tuning

Backup and Recovery Performance A key NNM administration task is the regular NNM backup, using ovbackup.ovpl. Expect an Oracle-based configuration to take about 5 minutes per 100 Mb of data for the operational portion of the backup (when NNM is paused). Recovery (with ovrestore.ovpl) should take about 3 minutes per 100 Mb for the operational portion of the restore in an Oracle-based configuration. The analytical phase of the restore should take about 30 seconds per 100 Mb of data. The data size described here is the combined size of all files placed in $OV_TMP/ovbackup/ during backup.

Using EMANATE SNMP Agent on Windows As discussed in Chapter 5, Polling Considerations, replacing the default SNMP agent with the EMANATE SNMP agent can improve event processing performance as well as increasing communication bandwidth between a management station and a collection station. This is particularly significant on Windows systems. Increasing the UDP socket buffer size on a memory-limited system can also improve the performance of event processing and distributed management communication. Use ovtrapd’s -W option to replace the default Windows SNMP agent (WinSNMP) and the -U option to increase the UDP socket buffer size. For example, to replace the default WinSNMP agent and increase the UDP socket buffer size to 200 Kb: · Edit $OV_LRF/ovtrapd.lrf to specify the -W and -U 200 as start- up options: OVs_YES_START:pmd:-W -U 200:OVs_WELL_BEHAVED:: · Run the ovaddobj command to update the configuration information: UNIX: $OV_BIN/ovaddobj $OV_LRF/ovtrapd.lrf Windows: %OV_BIN%/ovaddobj %OV_LRF%/ovtrapd.lrf · Restart ovtrapd.

52 Chapter 7 NNM Performance and Configuration Guide NNM Application Tuning

Other Factors Affecting NNM Performance Some variables important to sizing NNM configurations include: · What is the physical layout of the network? Slow connections, overloaded gateways, and so on can impact the discovery and monitoring capability of Network Node Manager. · What is the volatility of the network? Stable networks produce less discovery and layout work. Stable devices on the network produce less monitoring work. Such stability reduces system resource consumption and network bandwidth usage. · What are the polling frequencies to be used? Network Node Manager uses various types of polling to remain updated on the status and configuration of network devices as well as remote NNM installations. In terms of network and processor utilization, the more frequently these polls are executed the more load is placed on the network management system. · How much data collection for trend analysis and threshold monitoring is required? For the purpose of trend analysis or threshold monitoring, the collection of large amounts of data monitoring can consume a great deal of processing power and network bandwidth. Trend data collected over a long period of time can also consume a great deal of disk space. · What is the trap volume in the network to be managed? Traps consume processing resources as well as network bandwidth, especially when you forward them to other NNM stations. As a result of the receipt of events, automatic actions executed also have the potential to consume processor, memory, network, and disk space resources. · How many concurrent operators will use the network management system? Each operator session consumes processing power, system memory, and potentially network bandwidth. · To what degree is your NNM configuration distributed? NNM supports a model where multiple NNM stations can share the load of the various tasks necessary to manage your environment. This form of distribution has the potential to reduce the load on individual

Chapter 7 53 NNM Performance and Configuration Guide NNM Application Tuning

systems as well as reduce the network bandwidth required to manage a large environment. · To what degree is the On-Demand Submap feature used in the environment? The On-Demand Submap feature allows the user to make memory vs. processing resource trade-offs for services associated with presenting a network map to the user. Extensive use of this feature can result in dramatic reduction of memory usage by these services. · How many devices are to be managed and how many network interfaces are contained in these managed devices? The number of managed objects in the network management domain should always be carefully calculated.

54 Chapter 7 8 Distributed Management

Chapter 8 55 NNM Performance and Configuration Guide Distributed Management

Note that the topology synchronization discussed in this chapter, and the DIDM testing and results discussed in the next chapter, do not apply to the Extended Topology capabilities in NNM Advanced Edition. Extended topology information is not synchronized on a management station from collection stations. Events are propagated from collection stations to the management station, but Extended Topology information is not. To see Extended Topology information in a dynamic view, you must access the appropriate URL on the collection station, not on the management station. NNM allows multiple NNM stations to share Layer 3 topology information. This Distributed Internet Discovery and Monitoring (DIDM) feature has two logical components: the collection station and the management station. This section provides an overview of some of the characteristics of a DIDM environment that affect performance. In addition to the DIDM information presented here, the next section will discuss data that was collected while testing and monitoring different DIDM environments. This data is meant to provide the reader with a basic understanding of NNM’s performance in a DIDM environment. Management stations and collection stations have the same function except that collection stations export topology information to other NNM stations while management stations import topology information from other NNM stations. In some cases, it is possible that a single NNM station can act concurrently as a management station and a collection station. The sizing of management stations and collection stations is very similar to the stand-alone model. There are, however, some unique characteristics.

Topology Synchronization When a management station recognizes that its topology is not synchronized with that of a collection station it is managing (importing topology data from), it begins “synchronizing” its topology with that of the collection station. That is, the management station queries the collection station for all topology information that passes the collection station’s topology filter. Based on what it learns, the management station then updates the local topology.

56 Chapter 8 NNM Performance and Configuration Guide Distributed Management

Topology synchronization is the most time-consuming and CPU-intensive process with regard to DIDM. For this reason, it is advantageous to streamline this process as much as possible. Conditions causing the management station to synchronize with a collection station include: · A collection station’s managed state is changed from unmanaged to managed. · The management station detects it has missed a topology-related event from the collection station. (This happens rarely because the event forwarding mechanism is TCP-based.) · A previously configured management station is started up (ovrepld is restarted via ovstart) and the management station missed a topology event while it was offline). · The management station detects that the collection station has reset its topology database. (This happens when a collection station restores its topology database from a backup, or deletes and recreates its topology database.) · The user forces synchronization with a collection station via the nmdemandpoll -s command.

Collection Station Synchronization Duration Several factors influence when timing the synchronization of a collection station. These factors include: · Processor speed on the management station and the collection station. · Network bandwidth available between the management station and collection station. · Number of nodes exported by the collection station. · Number of changes required to the topology as a result of the synchronization

Chapter 8 57 NNM Performance and Configuration Guide Distributed Management

58 Chapter 8 9 DIDM Testing and Results

Chapter 9 59 NNM Performance and Configuration Guide DIDM Testing and Results

NNM Distributed Environment Test Setup Six NNM distributed environments were set up and analyzed. All four environments follow the Centralized-Hierarchical model described in A Guide to Scalability and Distribution for NNM. Table 9-1 and Table 9-2 detail the distributed environments used for performance testing. There are significant differences between these test environments that affect performance results. These differences include: · The type of management station being used. · The number of collection stations being managed. · CPU clock rates. · The number of CPU's. · Types of LAN connectivity. · LAN proximity (number of hops and bandwidth of connectors between hops). · The use of different domains. · The use of a single disk drive (poor I/O) versus multiple-lvm-striped for the databases. All collection stations were loaded with unique topologies resulting in no overlap on the management station. Each topology consisted of 5000 nodes and 10,000 interfaces (two per node). NNM used out-of-the-box settings on both management station and collection stations with one exception—DNS lookup was disabled on all systems through the use of the ipNoLookup.conf file. Discovery and polling was turned off on all of the collection stations.

60 Chapter 9 NNM Performance and Configuration Guide DIDM Testing and Results

Configuration 1 100,000 Nodes Ÿ 200,000 Interfaces Ÿ 300,000 Objects Management Station (MS) HP N4000-80 8 750 MHz processors 8 Gb RAM Two 9 Gb striped disk drives HP-UX 11.11 No. of Collection Stations (CS) 20 MS/CS Connection The management station and collection stations were in different domains and over 1000 miles apart. There were five hops between a collection station and the management station. The management station had a Gigabit fiber connection. Collection Station Pool 13: HP C3000, 400 MHz, 1 Gb RAM 3: HP C360, 367 MHz, 1 Gb RAM 2: HP B2000, 400 MHz, 1 Gb RAM 2: HP J2240, 240 MHz dual, 2 Gb RAM Configuration 2 50,000 Nodes Ÿ 100,000 Interfaces Ÿ 150,000 Objects Management Station (MS) HP N4000-80 8 750 MHz processors 8 Gb RAM 2 9 Gb striped disk drives HP-UX 11.11 No. of Collection Stations (CS) 10 MS/CS Connection The management station and collection stations were in different domains and over 1000 miles apart. There were five hops between a collection station and the

Chapter 9 61 NNM Performance and Configuration Guide DIDM Testing and Results

management station. The management station had a Gigabit fiber connection. Collection Station Pool 7: HP C3000, 400 MHz, 1 Gb RAM 2: HP B2000, 400 MHz, 1 Gb RAM 1: HP C360, 367 MHz, 1 Gb RAM Table 9-1. Configurations for N-Class Distributed Environments

62 Chapter 9 NNM Performance and Configuration Guide DIDM Testing and Results

Configuration 3 100,000 Nodes Ÿ 200,000 Interfaces Ÿ 300,000 Objects Management Station (MS) HP L2000 4 400 MHz processors 4 Gb RAM 4 9 Gb striped disk drives HP-UX 11.11 No. of Collection Stations (CS) 20 MS/CS Connection Management and collection stations were all in the same domain and in close proximity. Collection Station Pool 13: HP C3000, 400 MHz, 1 Gb RAM 3: HP C360, 367 MHz, 1 Gb RAM 2: HP B2000, 400 MHz, 1 Gb RAM 2: HP J2240, 240 MHz dual, 2 Gb RAM Configuration 4 50,000 Nodes Ÿ 100,000 Interfaces Ÿ 150,000 Objects Management Station (MS) HP L2000 4 400 MHz processors 4 Gb RAM 4 9 Gb striped disk drives HP-UX 11.11 No. of Collection Stations (CS) 10 MS/CS Connection Management and collection stations were all in the same domain and in close proximity. Collection Station Pool 7: HP C3000, 400 MHz, 1 Gb RAM 2: HP B2000, 400 MHz, 1 Gb RAM 1: HP C360, 367 MHz, 1 Gb RAM

Chapter 9 63 NNM Performance and Configuration Guide DIDM Testing and Results

Configuration 5 61,000 Nodes Ÿ 244,000 Interfaces Ÿ 305,000 Objects Management Station (MS) HP L2000 4 400 MHz processors 4 Gb RAM 4 9 Gb striped disk drives HP-UX 11.11 No. of Collection Stations (CS) 3 MS/CS Connection Management and collection stations were all in the same domain and in close proximity. Collection Station Pool 3: HP L2000; 440 MHz; 1.5, 2.0 and 4 Gb RAM Configuration 6 3,000 Nodes Ÿ 3,400 Interfaces Ÿ 6, 500 Objects Management Station (MS) HP L3000 2 750 MHz processors 4 Gb RAM HP-UX 11.11 No. of Collection Stations (CS) 4 MS/CS Connection Management and collection stations were all in the same domain and in close proximity. There were 2 hops between the management station and the collection stations. Note that DNS lookup was enabled in this configuration. Collection Station Pool 1: HP L3000, 750 MHz, 2.0 Gb RAM 2: HP C3600, 552 MHz, 512 Mb RAM 1: HP K260, 180 MHz, 640 Mb RAM Table 9-2. Configurations for L-Class Distributed Environments

64 Chapter 9 NNM Performance and Configuration Guide DIDM Testing and Results

Data Collected The data presented here is not absolute; analysis of all of the variables that affect the performance of a particular distributed environment is well beyond the scope of this document. This data is intended to provide a basic understanding of the behavior of a distributed environment and of the factors that influence its performance. Initial synchronization and resynchronization times for the four distributed configurations detailed in Table 9-1 and Table 9-2 are provided along with the maximum event flow that can be handled by each of the management stations.

Initial Synchronization Collection stations were “added” to the management stations at the same time and synchronization between the management station and all of the collection stations occurred simultaneously. Synchronizing this many collection stations at the same time was acceptable for both 10 and 20 collection station configurations. Table 9-3 shows the initial synchronization times for the different DIDM environments. Number of Collection Stations Management Station 10 20 N-Class 30 minutes 180 minutes* L-Class 40 minutes 120 minutes Table 9-3. Initial Synchronization Times

* There is some possibility that networking or other system level detail (only two disks available for striping) on the 8-CPU system was misconfigured and contributed to this anti-intuitive result. There are three key performance factors to note: · The 8-CPU system (N-Class) managing 10 collection stations performs better than the 4-CPU system (L-Class) managing 10 collection stations. The benefit of the faster clock rates of the N-Class system outweighed the benefit of the L-Class environment having

Chapter 9 65 NNM Performance and Configuration Guide DIDM Testing and Results

both management station and collection stations in the same network/domain. · Even though the N-Class environment had more CPU's at a faster clock rate (750MHz versus 440MHz), the network configuration became the limiting factor in the 20-collection station configuration. · For Configuration 5 (3 L-Class collection stations), when ECS is enabled, the event-processing rate was 27.4 events-in/second and 23.6 events-out/second. With ECS disabled, the rates were 54.1 events- in/second and 42.5 events-out/second. For Configuration 6 (dual processor L-class management station with 4 collection stations of varying capability), initial synchronization took 33 minutes. Large database sizes can also affect the initial synchronization time. In one test, initial synchronization from a single collection station with a 10K database took over two hours. The management station in this case was an L3000 with 2 750 MHz processors. Having only two processors also increased the synchronization time. (Also, note that DNS lookup was enabled in this configuration.)

Resynchronization When the ovrepld process is stopped and restarted, a full resynchronization between the management stations and all collection stations takes place. Resynchronization times vary with the amount of changes in the collection station topologies that occurred while ovrepld was stopped. The number of collection stations, whether 10 or 20, does not seem to be a factor in the length of resynchronization time. Table 9-4 shows the time that it would take to do a re-synchronization of the management station if ovrepld is shutdown (due to a system failure or planned maintenance et cetera) and interface changes occur on some or all of the nodes in the collection station topologies.

66 Chapter 9 NNM Performance and Configuration Guide DIDM Testing and Results

N-Class L-Class # of i/f changes 10 CS 20 CS 10 CS 20 CS 0 30 min 30 min 30 min 30 min 100 30 min 30 min 30 min 30 min 1,000 30 min 30 min 30 min 30 min 10,000 30 min 30 min 30 min 30 min 100,000 60 min 60 min 75 min 75 min Table 9-4. Resynchronization Times For Configuration 6 (dual processor L-class management station with 4 collection stations of varying capability) with 800 node status changes, resynchronization took 17 minutes. There are two key performance factors to note: · The re-synchronization times are constant up to the 10,000 changes occurring during the ovrepld down time regardless of the number of collection stations, the number of CPU's, clock rates, and LAN connectivity. · Increasing the number of changes 10x to the next tested amount of 100,000 only increased the re-synchronization time by 2x for the N- class environments and 2.5x for the L-class environments.

Event Rate The event rate that a management station can tolerate was determined by forcing the collection stations to send status change events to the management station at a constant rate for up to eight hours. The performance of the management station was gauged by examining the NNM processes for memory growth indicating buffers were backing up. If the event rate was too much for the management station to handle, the pmd, and/or ovrepld processes would start growing (as well as

Chapter 9 67 NNM Performance and Configuration Guide DIDM Testing and Results

ovtopmd and ovwdb). Unless the event rate was reduced, the management station performance would continue to degrade. A test event was also used to monitor the performance of the management station; a collection station was told to send a specific event to the management station and then the topology of the management station was queried every 10 seconds to see if the event had triggered the correct action. If no change was made to the management station’s topology within 15 minutes, it indicated that the management station could not support the event rate. Table 7-5 shows the maximum event rate the management station can sustain based on a continuous event flow from the collection stations. Number of Collection Stations Management Station 10 20 N-Class 45 events/sec 45 events/sec L-Class 35 events/sec 35 events/sec Table 9-5. Max Event Rate into Management Station

NOTE Event rates higher than these maximum rates will result in a build up or queuing of events causing them to be irrelevant (wrong possibly) by the time the event is propagated to the map/viewer. Increasing the polling times may help mask this problem, but that would lead to a false sense of security because the system will eventually queue up until that new higher polling time is then exceeded. You can see benefits with faster clock rates; however, the additional four CPUs in the N-class systems are not a factor since the number of processes at +100% is less than four.

68 Chapter 9 NNM Performance and Configuration Guide DIDM Testing and Results

NOTE The burst ability is not the issue here; this is the maximum sustained continuous rate. Until this has been tested, the rule of thumb that can be applied is based on the assumption that the total number of events during the burst period would average out over the burst recovery time period so that the system can get back to the normal continuous event rate. The normal/regular continuous rate would have to be lower than the maximum event rate so that it can recover from the burst.

Burst rate tests indicate that 2000 traps/sec can be tolerated for a short time.

Failover Management stations may be required to take over the duties of a collection station in the event of a collection station failure. Or, a collection station may take over the duties of another collection station if that other collection station fails. In this case, the load on the management station or on the failover collection station will, of course, increase. The exact increase will be affected by various factors such as: · How many collection stations failed? · How many nodes does each failed collection station monitor? · Is there overlap between the nodes monitored by the failed collection stations? For example, a management station might have fewer than 100 polls per hour in order to monitor four collection stations. This management station might find polls increasing to over 4,500 per hour due to a single collection station failure, but only 14,000 per hour when all four collection stations fail, due to differing numbers of nodes being monitored by the collection stations and some overlap between the collection stations’ nodes. Note that failover capabilities are only supported with the netmon poller.

Chapter 9 69 NNM Performance and Configuration Guide DIDM Testing and Results

70 Chapter 9 10 NNM SE Sizing Considerations

Chapter 10 71 NNM Performance and Configuration Guide NNM SE Sizing Considerations

Node and Object Calculation The network management architect must distinguish between nodes and objects. This distinction is crucial to network performance and configuration issues. In the ovw database, one node is associated with many objects, in one-to-many ratios. Although a network model is typically described in terms of the number of nodes managed by the application, to predict how the application will scale and perform, the more important parameter is the number and type of managed objects. For instance, a segment object requires less processor work to manage and less memory to store than node or interface objects. Determining the exact number and type of managed objects is especially critical when predicting, for example, the size of a database, the load on netmon’s polling cycle, or the load imposed on a management station when synchronizing with a collection station. The suggested multiplier to derive the number of objects per node is 2.4. This number was determined by investigating the node-to-object (networks + segments + nodes + interfaces) ratio in a typical environment—a network comprised mostly of hosts with a single interface attached to a gateway interface. That is, the ratio is calculated as: NETWORKS + SEGMENTS + NODES + INTERFACES NODES Not all network environments conform to the typical environment described above. There are environments (for example, a network center responsible for monitoring only backbone routers) where the suggested multiplier of 2.4 objects to a single node does not apply. For these networks, the ratio of objects to nodes increases to accommodate enterprise routers that tend to have more than two physical interfaces. There may also be additional virtual interfaces with secondary addresses, thereby increasing the number of objects associated with that particular managed node. Items to count include: · IP addresses · IPX addresses · Switch ports

72 Chapter 10 NNM Performance and Configuration Guide NNM SE Sizing Considerations

· Virtual circuits (ATM and Frame Relay routers) · Router-to-router unnumbered interfaces (Cisco routers)

Nodes A node is Network Node Manager’s abstraction of a physical network entity, such as a router, a hub, a printer, a host, a personal computer, a switch, or any other device on the network that is known to NNM. Depending on whether netmon monitors it, a node may be managed or unmanaged. The command ovtopodump -l issues a summary reporting the number of nodes and managed nodes in the NNM topology database. Sample output provided by the ovtopodump -l command is presented below. NUMBER OF STATIONS: 3 NUMBER OF NETWORKS: 9 NUMBER OF SEGMENTS: 102 NUMBER OF NODES: 606 NUMBER OF INTERFACES: 1,464 NUMBER OF GATEWAYS: 16 NUMBER OF MANAGED NODES: 545 In addition to displaying the number of nodes, this command summarizes the number of other object types found in the topology databases: collection stations, networks, segments and interfaces. A gateway object is actually a special type of node object. This special node contains multiple interfaces and has IP forwarding enabled. The ovtopodump command counts gateway objects in the node count. In the context of ovtopodump, a “node” is referred to as an object type as well as an abstract model of a network entity. For the purpose of performance and configuration, the reference to node should be understood as an abstract representation of a physical network device in a managed state. Distinguishing managed from unmanaged nodes is important because the locally managed nodes consume system, CPU, and network resources. Conversely, very little processing overhead is associated with unmanaged nodes for two primary reasons: · Unmanaged nodes are not subject to configuration checks.

Chapter 10 73 NNM Performance and Configuration Guide NNM SE Sizing Considerations

· Contained interfaces are not subject to status checks. Overhead is associated, however, with storing and displaying unmanaged nodes, and with synchronizing unmanaged node information between collection station and management station topologies.

NOTE If you are using Distributed Internet Monitoring on a management station which receives information from a collection station, the number of non-locally managed nodes on the local station is indistinguishable in the ovtopodump -l output. To determine the number of objects owned by a remote collection station on a management station, execute ovtopodump -RISCc | wc -l. This command returns a number derived from the line count on the pattern match of the collection station name. On Windows, execute: ovtopodump -RISCc >topo.out then edit topo.out. Go to the last line of the file and get the line number from the bottom of the window.

The summary information from ovtopodump -l ignores the number of deleted objects in the topology database, therefore only non-deleted objects in the topology database are summarized.1 The ovobjprint -S command, however, differs from this in that it displays the total number of objects in the object database.2 Do not use the ovobjprint command for calculations.

1 All deleted objects in the topology database can be viewed by executing an ovtopodump - RISC | grep -e '*' -e '@' -e '\$' -e '#'. The * distinguishes the removed objects; @, the removed invisible objects; $, the removed secondary invisible objects; and #, the removed secondary object. For more information, consult the ovtopodump reference page in NNM online help (or man page on UNIX). 2 Caution: The number of objects reported by the ovobjprint -S or -T command reflects only the total number of objects in the object database. ovobjprint is intended as a tool to expose the data associated with a particular node. The output from this does not account for whether a node is removed, managed, unmanaged, owned locally or owned by a collection station. Therefore, the use of this command is discouraged when attempting to predict workload.

74 Chapter 10 NNM Performance and Configuration Guide NNM SE Sizing Considerations

The Relationship of Nodes and Interfaces Each node has one or many associated interfaces in the database. This relationship of nodes to interfaces can be illustrated using the ovtopodump command.

OBJECT ID(S) OBJECT STATUS IP ADDRESSES

1101 IP test1.hp.com Normal 10.10.100.100

1101/1100 IP test1.hp.com Normal 10.10.100.100

The sample output above displays the relationship between the node (1101: first line) and its associated interface (1101/1100: second line). If this host had more than one interface, the other interfaces would be listed in a similar fashion as line two. An example of a node with multiple interfaces is provided in the sample output below. Interfaces as well as nodes can be either managed or unmanaged.

OBJECT ID(S) OBJECT STATUS IP ADDRESSES

5024 IP test2.hp.com Marginal 10.10.100.110

5024/5023 IP test2.hp.com Normal 10.10.100.110

5024/7146 IP test2.hp.com Normal 10.10.100.110

5024/7147 - test2.hp.com Critical -

5024/7148 - test2.hp.com Critical -

Managed Nodes and Interfaces The number of managed nodes reported by ovtopodump cannot solely determine the performance and scalability of an instance of NNM. This is because the reported number of managed nodes does not reflect the total number of managed objects. Rather, the total number of managed nodes combined with the number of managed interfaces contributes to the workload on a particular system.

Chapter 10 75 NNM Performance and Configuration Guide NNM SE Sizing Considerations

Objects In the context of performance and configuration, the term objects is used to refer to the sum total of managed nodes, the number of interfaces associated with each node, and the number of networks and segments. If you enable level 2 discovery, a greater number of interface objects may be discovered and stored. If devices in your network support the Bridge MIB and Repeater MIB, many more segment objects may be discovered.

Determining the Number of Objects In a flat network (such as, a network comprised mostly of hosts with single network interfaces attached to hub segments feeding into a default gateway), the number of objects is predictably in the range of 2.4 objects per node. This ratio, however, can change if there are large numbers of multi- homed devices that contain more than one interface per node (such as routers). An Internet Service Provider, for example, which monitors only routers, will have a much higher interface to node ratio than the flat network described above. As the number of interfaces increases, the number of nodes that can be managed diminishes. If your average number of interfaces per node is greater than 2.4, then it is important to accurately account for and estimate the total number of objects managed in your environment.

Estimating the Number of Objects The task of estimating the number of objects in the managed environment is a challenge to all who attempt to size an NNM installation. The following two sections describe the recommended methods for completing this task.

76 Chapter 10 NNM Performance and Configuration Guide NNM SE Sizing Considerations

Using NNM Prior to implementation, start Network Node Manager on a test platform and allow it to discover the target environment.3 To ensure that all nodes are discovered and reported correctly, configure the SNMP configuration database (see xnmsnmpconf) to include the community strings of all gateway and managed devices. This enables the netmon process to query each device for configuration information and accurately model each object. All target devices must be in a managed state.

WARNING If the SNMP community names for the devices are not properly configured, you may not get accurate counts for the numbers of nodes and interfaces.

When the discovery completes, execute ovtopodump -l.4 This provides a summary of the objects on the management station. Use this to calculate your node-to-object ratio. Multiply the number of nodes in the environment by this ratio to estimate the total number of objects. If this ratio is greater than 2.4 objects per node, make the appropriate adjustments in the performance and configuration formulas. You can do this by dividing the total number of objects by 2.4. The resulting value is the node count to be used in formulas provided in this document. To estimate the number of objects exported by a collection station, execute ovtopodump -c | wc -l. This command executes a topology database dump, based on the information in the topology MIB on the collection station. If a topology filter is configured for the collection station, the command counts only those objects that pass the filter. On Windows execute, ovtopodump -c >topo.out then edit topo.out. Go to the last line of the file, and get the line number from the bottom of the window.

3 For information on expediting the discovery of the environment, see the netmon reference page in NNM’s online help or man page on Unix. 4 Note that there is no event signaling that the initial discovery is complete. Determining whether the discovery is complete requires a detailed knowledge of the target environment and examination of NNM maps and/or the topology object database for the expected results.

Chapter 10 77 NNM Performance and Configuration Guide NNM SE Sizing Considerations

Manual Audit If a version of NNM is unavailable, then a physical inventory of the number of managed network devices is required. To conduct a manual audit, determine which network devices are multi-homed and which are gateway devices containing physical interfaces which map to multiple virtual interfaces. Multi-homed devices are SNMP manageable devices such as routers, switches, hubs or network devices with a single physical interface and multiple virtual interfaces. For example, they can be one of the following: · A router with multiple IP addresses configured for the interface. · An interface card with both IP and IPX configured. · A hub or switch. When calculating the number of interfaces associated with a particular device, observe some general rules: · If a host or PC has multiple network stacks bound to its interface (as is the case with IPX), then two interfaces should be calculated for those devices. · If your environment supports hubs and Ether switches, and is SNMP manageable and supports the Bridge MIB, the Repeater MIB or 8023MAU MIB, you must count the number of segment objects and associated interfaces. If the hub supports both IP and IPX for administrative purposes, then anticipate two additional interfaces. · Each IP network has an associated network object. If your site uses IP subnetting, then only the network that maps to the subnet mask will be reported as a network. All subnetted IP addresses are included in that network. · For routers, count the number of physical interfaces and then add any additional virtual IP interfaces that are associated with that router. For example, if a single interface is bound to two IP addresses, NNM creates an object for each of the IP addresses. Anticipating the correct number of interfaces is especially important for large backbone routers such as Cisco 7000s. These typically have a large number of physical interfaces with many IP addresses bound to each.

78 Chapter 10 NNM Performance and Configuration Guide NNM SE Sizing Considerations

· Finally, determine the total number of interfaces, virtual and physical. Virtual interfaces include the reuse of a physical interface by more than one protocol, or pseudo interfaces such as SLIP and PPP that come and go with usage. Use these to generate the object-to- node ratio in your environment. As stated in the previous section, if the object-to-node ratio is not 2.4, divide the total number of objects by 2.4 to get an adjusted node count to be used in the formulas provided in this document.

Chapter 10 79