Staying out of the Swamp

Staying out of the server swamp Richard Baum Perforce Software October, 2001 Contents Introduction How do I tell if I'm in the swamp? Is your system CPU bound? Is your system memory bound? Is your system I/O bound? How can Perforce cause server swamp? Network Attached Storage Confusing and complex client mappings Background processes The Perforce error log Gigantic operations Conclusion Introduction Perforce runs extremely well when it is given the right resources. A Perforce server does not generally require much CPU. Memory and disk requirements correspond to the amount of data you wish to store. Conditions can sometimes conspire to change a well-performing server into a poorly-performing one. This talk will cover some of the things to watch out for to keep your Perforce server happy and healthy. The object of this talk is to familiarize you with what to look for so you can determine where the problem lies, and what to do so you can remedy the problem. In general, performance that a user will see is limited by the I/O bandwidth of the server and the speed of its connection with a client machine. A server that appears to not be responding in its typically speedy fashion may, in fact, be swamped with data and requests for data. How do I tell if I'm in the swamp? If you suspect that your Perforce server is swamped, the first things to do are to check whether it is, in fact, running, and to examine the machine that hosts the server for any obvious signs of a problem. The main areas for problems have to do with the CPU, memory, and I/O operations. Most operating systems provide tools with which you can easily get a good understanding of what is happening. A basic understanding of how these tools work will go a long way towards figuring out what is wrong with your server. Is your system CPU bound? A Perforce server running on Unix is typically run from a daemon, or parent process. That process then spawns child processes that handle user requests. The server can also be run from Inetd, eliminating the need for a parent process. Check to see whether your server system is using an abnormal amount of CPU. If there are no free processor cycles then you may have a problem, though it is not necessarily a problem with Perforce. Check to see what Perforce server processes are running. On SVR5 based Unix systems (Solaris, Linux) you can use the ps -ef command to do this. Berkely Unix systems (FreeBSD) use p4 - axl and have similar output. To determine which process is doing what, look at the output of the ps command. The second column, "PID" lists the process ID of the process. The third column, "PPID" lists the ID of that processes parent. In the example below, the parent process, 795, has spawned two children, 1909 and 1911. Abbreviated process table output chinadoll:reb% ps -ef UID PID PPID C STIME TTY TIME CMD perforce 795 680 0 10:38:39 pts/4 0:00 ./p4d -p 1667 -r . perforce 1909 795 7 11:59:25 pts/4 0:33 ./p4d -p 1667 -r . perforce 1911 795 9 11:59:41 pts/4 0:09 ./p4d -p 1667 -r . A Perforce server under Windows runs via multiple threads under a single process ID. You can use the NT or Windows 2000 task manager to determine the overall CPU utilization of your server. The system Performance Monitor (Programs->Administrative Tools->Performance Monitor) provides additional functionality and allows you to monitor individual thread performance. However, since all of the threads are a part of the same process the task manager's indication that the p4d or p4s process is using up a large percentage (or all) of the available CPU is usually enough information to proceed as you can not control threads individually. Is your system memory bound? The next step in determining what might be wrong is to take a look at the amount of physical memory in your machine, the amount of swap space defined, and the amount of each of these that is free or being used. Systems that have just enough memory may run fine for a time but a large operation may cause swapping which will slow things considerably. On Unix systems, you can determine the amount of free memory and swap space with the vmstat command. It takes an argument, the number of seconds to wait between calculations. The "avm" or "swap" column indicates the number of free virtual memory (swap) pages or Kbytes. The "free" column indicates the number of free pages or Kbytes of RAM. vmstat output of a Solaris system chinadoll:reb% vmstat 3 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr dd dd f0 s0 in sy cs us sy id 0 0 0 1532968 496240 118 315 0 22000000328512154 1594 0 0 0 1532968 496240 118 315 0 20003000337514169 2593 0 0 0 1532912 496312 468 334 28 10220 028000386865173 21 8 71 0 0 0 1532760 508968 1537 315 113 84000 0950005161991 176 79 18 3 0 0 0 1532736 542296 1541 315 113 70000 0910005081909 157 81 18 1 0 0 0 1532728 557656 1402 315 102 08000 0890005051804 181 70 16 14 In order to better interpret the results you will also need to know how much swap space has been configured and how much RAM the system has in it. To determine the amount of swap space configured, use the swap -s command (Solaris) or the swapinfo command (FreeBSD). swap -s output of a Solaris system: chinadoll:reb% swap -s total: 22232k bytes allocated + 4520k reserved = 26752k used, 1534024k available From this we can see that the amount of swap space available is approximately the same amount as is configured on the system and that the system is not having a problem due to a shortage of memory. Use the dmesg command to determine the amount of physical memory installed on the server. You will likely have to wade through the output, but such information is output at boot time and should appear somewhere near the top of the output. Partial dmesg output of a Solaris system: Sep 9 21:45:42 chinadoll unix: [ID 389951 kern.info] mem = 655360K (0x28000000) Sep 9 21:45:42 chinadoll unix: [ID 930857 kern.info] avail mem = 638574592 From this and the vmstat output above, we can see that out of a total of 640MB of RAM (655360kb) there were between 496240kb and 557656kb of free memory at the time of the vmstat run. We can also see, from the last three columns of the vmstat output the percentage of user, system, and idle time the CPU had at the time of each line of output. Even without knowing the amount of memory or swap space, vmstat can tell you a lot. Look at this output from a different Solaris system as it begins to swap and then stops swapping. vmstat output of a Solaris system that is swapping: procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 s3 in sy cs us sy id 0 0 0 644688 28240 0 1263 608 5 5 0 0 103 2 5 762 4440 714 26 40 33 1 0 0 532216 18940 7 2281 226 8 8 0 0 9 2 53 511 2892 600 14 62 24 6 0 0 528912 7652 15 2159 44 232 818 3072 3027386112488 613 32 61 6 5 0 0 521648 7112 10 2369 94 222 330 1640 44 35 8 2 781 2569 742 30 70 0 6 0 0 525804 7136 21 2381 1310 672 1840 1500 599 24 5 21 637 2741 634 31 69 0 2 0 0 527992 7880 9 1349 3405 557 2066 1100 684 7 4 133 675 1457 643 16 45 39 1 0 0 530208 6780 13 1261 3553 1170 3580 1220 1079 32 4 165 725 1375 661 12 43 46 1 0 0 526996 7028 7 855 36 181 530 1576 149 10 2 4 569 4596 539 66 23 11 0 0 0 527548 8440 16 541 65 114 250 1408 55 14 1 1 505 2002 559 44 20 36 0 0 0 530664 11008 2 499 25 2 0 1032 0 16 6 1 383 1209 51161579 1 0 0 531460 35916 0 370 130 0 0 756 0 4 6 0 367 866 51932473 0 0 0 645488 123780 0 108 20 0 0 516 0 5 5 3 376 830 457 4 7 89 The output shows the system using over 100mb of swap space and then going back to not swapping. The highlighted lines show that CPU utilization peaks at the start of the swapping, with zero idle time. The CPU utilization goes down shortly thereafter as the system has to then spend a lot more time waiting for tasks to be swapped in and less time performing actual work. The page faults (pi and po) peak here while the CPU gets some breathing room. While swapping, the de column shows the number of pages of memory that the system thinks it will be short while the pi and po columns show the number of kb brought in and of of swap. Windows NT and 2000 provide a "Task Manager" facility that allows you to monitor CPU and memory utilization. You can bring it up by pressing Ctrl- Alt-Delete and selecting the "Task Manager" button.

Load more