K16739: Understanding '' output on the BIG-IP system

Non-Diagnostic

Original Publication Date: Mar 13, 2020

Update Date: Oct 2, 2020

Topic

The top command is a utility that provides a real- view of a running system. When using the top utility to monitor the BIG-IP system, you should be aware of information that may help you accurately interpret the output.

Description

The top command is a Linux utility that you can use to display system summary information and a list of processes managed by the system. The top utility reads from the /proc file system to gather kernel information, and displays the information in a table format that appears similar to the following example:

Note: The following top output is truncated for brevity. top - 15:24:56 up 8 days, 4:52, 3 users, average: 0.31, 0.20, 0.08 Tasks: 252 total, 1 running, 251 sleeping, 0 stopped, 0 zombie Cpu(s): 1.2%us, 0.7%sy, 0.3%ni, 97.4%id, 0.2%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 4063072k total, 3975468k used, 87604k free, 146572k buffers Swap: 1023992k total, 2796k used, 1021196k free, 468544k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10510 root RT 0 2430m 121m 95m S 5.6 3.1 686:26.35 tmm 7438 root 20 0 487m 104m 28m S 0.7 2.6 68:32.09 mcpd 7662 root 25 5 50636 9480 6272 S 0.7 0.2 7:59.31 merged 15757 root 20 0 2444 1224 852 R 0.7 0.0 0:00.04 top 9 root 20 0 0 0 0 S 0.3 0.0 0:23.62 ksoftirqd/1 7671 root 20 0 35260 19m 3168 S 0.3 0.5 102:25.09 statsd 1 root 20 0 2180 584 532 S 0.0 0.0 0:06.70 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.13 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:00.79 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 0:05.57 ksoftirqd/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 6 root RT 0 0 0 0 S 0.0 0.0 0:00.92 watchdog/0 7 root RT 0 0 0 0 S 0.0 0.0 0:00.86 migration/1 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 10 root RT 0 0 0 0 S 0.0 0.0 0:01.35 watchdog/1 11 root 20 0 0 0 0 S 0.0 0.0 3:56.73 events/0 12 root 20 0 0 0 0 S 0.0 0.0 0:50.25 events/1 The top sections are summarized as follows:

Summary area

The first section of top output shows an alphabetical list of global summary fields, such as system uptime information, a summary of tasks, CPU states, and memory and swap usage.

The first line in the Summary shows current time (15:24:56), uptime of the system (up 8 days, 4:52), user sessions logged in (3 users), and the load average (load average: 0.31, 0.20, 0.08). The three values for load average refer to the last minute, five minutes, and 15 minutes.

The load average values represent the number of jobs in the run queue or waiting for disk I/O averaged over 1, 5, and 15-minute intervals. Do not confuse these values with CPU utilization percentage; CPU utilization percentage is based on a time interval of how long a task actively used the CPU. This information appears under CPU.

To interpret the meaning of the load averages, you need to know the number of CPUs a system contains because this directly relates to the ratio value reported by the averages. For example, on a single CPU system, a load average value of 1.00 indicates that the system is efficiently using the CPU. If the value is higher than 1.00, it indicates that the demand for the CPU is high and tasks and the system is queueing CPU time requests. If the value is below 1.00, CPU time is not in demand, so no queuing occurs.

On multi-processor systems, the load is relative to the number of processor cores available: 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, and so on.

The information for the load average reported in the top command comes from the /proc/loadavg pseudo file. The /proc/loadavg file contains five fields; the first three fields are the load average over 1, 5, and 15 minutes. The fourth field consists of two numbers separated by a slash. The first number is the number of currently runnable kernel processes, threads. The second number following the slash is number of kernel processes, threads that currently exist on the system. The fifth field is the PID of the that recently created on the system.

Example:

# cat /proc/loadavg 0.36 0.17 0.24 1/456 14974

In this example, the load average is very low. The fourth column (1/456) shows that, out of the 456 kernel processes, only one is currently running at the time the information is gathered. If more processes are running, the load average is a higher value.

Tasks: The Tasks line displays a summary of processes running on the system. The processes can be in one of several different states. CPUs: The CPU state shows a percentage of CPU time in different modes. If you add the percentages, the total should be equal to 100 percent of the CPU. The meaning of different CPU times include the following: us, user: CPU time in running (un-niced) user processes sy, system: CPU time in running kernel processes ni, niced: CPU time in running niced user processes wa, I/O wait: CPU time waiting for I/O completion. For a given CPU, the I/O wait time is the time during which the CPU was idle and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request). hi: CPU time serving hardware interrupts si: CPU time serving software interrupts st: CPU 'st' (steal time) represents the time in which the real CPU was not available to the current virtual machines. Memory: The Memory section shows system memory usage for physical memory. Linux allocates most available free memory to buffers and disk caching, and Linux utilities, such as top and free, may report that only a small amount of memory is free. This is normal behavior; a program can quickly reclaim cached memory if it needs it. While free+buffers+cached is a traditional measure of free or free-able memory, it's not particularly accurate. MemAvailable from cat /proc/meminfo is much more accurate. BIG-IP 13.0 and later, which use CentOS 7.x kernels, can provide that measurement. Swap: The Swap section shows memory available on a paging file (swap file) on non-volatile storage. Typically, this is 1GB on vCMP guests and BIG-IP VE or 5GB on bare-metal systems. The BIG-IP system often pages out unused or less-used 4KB size memory pages to free up RAM for more valuable uses. The system cannot page out huge (2MB size) pages, and it typically provisions 50- 90% of the RAM as huge memory pages.

Smaller memory systems with multiple modules may often have nearly all swap in use (1GB swap file size), and that need not be an issue if MemAvailable is still high enough. Other systems typically use less swap. It's also normal for pages to stay in the swap file until they are referenced, when the kernel attempts to move them back to RAM. If transient memory pressure is present, the kernel may transfer unused pages to the swap file. When the pressure on RAM diminishes, the pages remain in the swap file.

If there is persistent memory pressure in RAM due to undersizing, incorrect provisioning or a memory leak then swap can reach high levels. The issue is a downward trend in free and free-able RAM, and this is typically indicated by a persistent downward trend in MemAvailable (from /proc/meminfo) and increased resident+swap use by a process.

Column headings

The default columns are defined as follows:

PID

The Process ID that identifies a process.

USER

The user name of the owner of the processes.

PR The priority of the process. Some values in this field may list RT, which indicates that the process is running under real time.

NI

The nice value of the process. Lower values mean higher priority.

VIRT

The amount of virtual memory used by the process. This reports all process memory maps, including large files on disk such as shared libraries. This represents how much memory the process can access at the present moment.

RES

The resident memory size. Resident memory is the amount of non-swapped physical memory a task is using. This represents the actual space a process occupies in physical memory.

SHR

SHR is the shared memory used by the process. This is a memory space accessed by multiple processes for the same piece of data, such as shared libraries. This represents how much of the VIRT size is actually shareable.

S

The S column indicates one of the following process status values:

D - uninterruptible sleep. The process cannot be interrupted and is waiting for something to happen. For example, the process is waiting on the disk and does not respond to signals. R - running. The process is currently running. S - sleeping. The process is waiting for something to happen and can be interrupted by sending a signal. T - traced or stopped. The process is stopped either because it received a STOP signal (SIGSTOP or SIGSTP) or was being traced. For example, ctl-Z was used to place the process in the background, or a debugger, such as ptrace, stopped the process. Z - zombie. The process terminates, waits on a parent process, and generally disappears on its own. %CPU

The percentage of CPU time the task has used since last update. Note the default top output in F5 iHealth may be misleading because it is calculated from total process CPU use / uptime of unit and cannot reflect recent changes from average. There is a warning about this on the top command page.

%MEM

The percentage of available physical memory used by the process. TIME+

The total CPU time the task has used since it started (represented as :. ).

COMMAND

The command that was used to start the process.

Task area

The tasks area displays a list of processes running on the system and shows the different states in which the process resides.

Recommendations

When reviewing system summary information, such as CPU and memory statistics, on the BIG-IP system, you should consider the following factors:

When using top to monitor BIG-IP CPU utilization, you should let top refresh several times to get an accurate CPU reading. This is important because the top utility calculates CPU utilization by looking at the change in CPU time values between samples. If the Traffic Management Microkernel (TMM) sleep cycle falls between samples, the CPU reading may not reflect accurate statistics. When running top in interactive mode, press 1 to display statistics for all CPU cores individually. To display which CPU was last used by a process, press f to select fields, then press j to select Last used cpu, then press Enter. When running top in interactive or batch mode, press H to display individual process threads. By default, top displays a summation of the process threads rather than the individual threads. For example, when displaying top without individual threads, the %CPU for a given process reflects the total CPU usage by all the threads for that process which may total greater than 100%. F5 recommends that you use the Configuration utility or the TMOS Shell (tmsh) command to view system summary information on the BIG-IP system.

Supplemental Information

K16419: Overview of BIG-IP memory usage K15468: Overview of BIG-IP TMM CPU usage K14358: Overview of Clustered (11.3.0 and later)

Applies to:

Product: BIG-IP 15.X.X, 14.X.X, 13.X.X, 12.X.X, 11.X.X