K16739: Understanding 'Top' Output on the BIG-IP System

K16739: Understanding 'top' output on the BIG-IP system Non-Diagnostic Original Publication Date: Mar 13, 2020 Update Date: Oct 2, 2020 Topic The top command is a Linux utility that provides a real-time view of a running system. When using the top utility to monitor the BIG-IP system, you should be aware of information that may help you accurately interpret the output. Description The top command is a Linux utility that you can use to display system summary information and a list of processes managed by the system. The top utility reads from the /proc file system to gather kernel information, and displays the information in a table format that appears similar to the following example: Note: The following top output is truncated for brevity. top - 15:24:56 up 8 days, 4:52, 3 users, load average: 0.31, 0.20, 0.08 Tasks: 252 total, 1 running, 251 sleeping, 0 stopped, 0 zombie Cpu(s): 1.2%us, 0.7%sy, 0.3%ni, 97.4%id, 0.2%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 4063072k total, 3975468k used, 87604k free, 146572k buffers Swap: 1023992k total, 2796k used, 1021196k free, 468544k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10510 root RT 0 2430m 121m 95m S 5.6 3.1 686:26.35 tmm 7438 root 20 0 487m 104m 28m S 0.7 2.6 68:32.09 mcpd 7662 root 25 5 50636 9480 6272 S 0.7 0.2 7:59.31 merged 15757 root 20 0 2444 1224 852 R 0.7 0.0 0:00.04 top 9 root 20 0 0 0 0 S 0.3 0.0 0:23.62 ksoftirqd/1 7671 root 20 0 35260 19m 3168 S 0.3 0.5 102:25.09 statsd 1 root 20 0 2180 584 532 S 0.0 0.0 0:06.70 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.13 kthreadd 3 root RT 0 0 0 0 S 0.0 0.0 0:00.79 migration/0 4 root 20 0 0 0 0 S 0.0 0.0 0:05.57 ksoftirqd/0 5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 6 root RT 0 0 0 0 S 0.0 0.0 0:00.92 watchdog/0 7 root RT 0 0 0 0 S 0.0 0.0 0:00.86 migration/1 8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 10 root RT 0 0 0 0 S 0.0 0.0 0:01.35 watchdog/1 11 root 20 0 0 0 0 S 0.0 0.0 3:56.73 events/0 12 root 20 0 0 0 0 S 0.0 0.0 0:50.25 events/1 The top sections are summarized as follows: Summary area The first section of top output shows an alphabetical list of global summary fields, such as system uptime information, a summary of tasks, CPU states, and memory and swap usage. The first line in the Summary shows current time (15:24:56), uptime of the system (up 8 days, 4:52), user sessions logged in (3 users), and the load average (load average: 0.31, 0.20, 0.08). The three values for load average refer to the last minute, five minutes, and 15 minutes. The load average values represent the number of jobs in the run queue or waiting for disk I/O averaged over 1, 5, and 15-minute intervals. Do not confuse these values with CPU utilization percentage; CPU utilization percentage is based on a time interval of how long a task actively used the CPU. This information appears under CPU. To interpret the meaning of the load averages, you need to know the number of CPUs a system contains because this directly relates to the ratio value reported by the averages. For example, on a single CPU system, a load average value of 1.00 indicates that the system is efficiently using the CPU. If the value is higher than 1.00, it indicates that the demand for the CPU is high and tasks and the system is queueing CPU time requests. If the value is below 1.00, CPU time is not in demand, so no queuing occurs. On multi-processor systems, the load is relative to the number of processor cores available: 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, and so on. The information for the load average reported in the top command comes from the /proc/loadavg pseudo file. The /proc/loadavg file contains five fields; the first three fields are the load average over 1, 5, and 15 minutes. The fourth field consists of two numbers separated by a slash. The first number is the number of currently runnable kernel processes, threads. The second number following the slash is number of kernel processes, threads that currently exist on the system. The fifth field is the PID of the process that recently created on the system. Example: # cat /proc/loadavg 0.36 0.17 0.24 1/456 14974 In this example, the load average is very low. The fourth column (1/456) shows that, out of the 456 kernel processes, only one is currently running at the time the information is gathered. If more processes are running, the load average is a higher value. Tasks: The Tasks line displays a summary of processes running on the system. The processes can be in one of several different states. CPUs: The CPU state shows a percentage of CPU time in different modes. If you add the percentages, the total should be equal to 100 percent of the CPU. The meaning of different CPU times include the following: us, user: CPU time in running (un-niced) user processes sy, system: CPU time in running kernel processes ni, niced: CPU time in running niced user processes wa, I/O wait: CPU time waiting for I/O completion. For a given CPU, the I/O wait time is the time during which the CPU was idle and there was at least one outstanding disk I/O operation requested by a task scheduled on that CPU (at the time it generated that I/O request). hi: CPU time serving hardware interrupts si: CPU time serving software interrupts st: CPU 'st' (steal time) represents the time in which the real CPU was not available to the current virtual machines. Memory: The Memory section shows system memory usage for physical memory. Linux allocates most available free memory to buffers and disk caching, and Linux utilities, such as top and free, may report that only a small amount of memory is free. This is normal behavior; a program can quickly reclaim cached memory if it needs it. While free+buffers+cached is a traditional measure of free or free-able memory, it's not particularly accurate. MemAvailable from cat /proc/meminfo is much more accurate. BIG-IP 13.0 and later, which use CentOS 7.x kernels, can provide that measurement. Swap: The Swap section shows memory available on a paging file (swap file) on non-volatile storage. Typically, this is 1GB on vCMP guests and BIG-IP VE or 5GB on bare-metal systems. The BIG-IP system often pages out unused or less-used 4KB size memory pages to free up RAM for more valuable uses. The system cannot page out huge (2MB size) pages, and it typically provisions 50- 90% of the RAM as huge memory pages. Smaller memory systems with multiple modules may often have nearly all swap in use (1GB swap file size), and that need not be an issue if MemAvailable is still high enough. Other systems typically use less swap. It's also normal for pages to stay in the swap file until they are referenced, when the kernel attempts to move them back to RAM. If transient memory pressure is present, the kernel may transfer unused pages to the swap file. When the pressure on RAM diminishes, the pages remain in the swap file. If there is persistent memory pressure in RAM due to undersizing, incorrect provisioning or a memory leak then swap can reach high levels. The issue is a downward trend in free and free-able RAM, and this is typically indicated by a persistent downward trend in MemAvailable (from /proc/meminfo) and increased resident+swap use by a process. Column headings The default columns are defined as follows: PID The Process ID that identifies a process. USER The user name of the owner of the processes. PR The scheduling priority of the process. Some values in this field may list RT, which indicates that the process is running under real time. NI The nice value of the process. Lower values mean higher priority. VIRT The amount of virtual memory used by the process. This reports all process memory maps, including large files on disk such as shared libraries. This represents how much memory the process can access at the present moment. RES The resident memory size. Resident memory is the amount of non-swapped physical memory a task is using. This represents the actual space a process occupies in physical memory. SHR SHR is the shared memory used by the process. This is a memory space accessed by multiple processes for the same piece of data, such as shared libraries.

Load more