Tuning and Testing the Freebsd 6 TCP Stack
Total Page:16
File Type:pdf, Size:1020Kb
Tuning and Testing the FreeBSD 6 TCP Stack Lawrence Stewart, James Healy Centre for Advanced Internet Architectures, Technical Report 070717B Swinburne University of Technology Melbourne, Australia [email protected], [email protected] Abstract— Tuning an operating system’s network stack II. CONFIGURABLES is crucial in order to achieve optimum performance from network intensive applications. This report discusses some A. The TCP Host Cache of the key factors that affect network performance of FreeBSD 6.x based hosts. It then goes on to explain how The host cache is used to cache connection details to tune the FreeBSD 6.x network stack and how to easily and metrics to improve future performance of connec- test and evaluate the changes made. tions between the same hosts. At the completion of a Index Terms— FreeBSD, TCP, Tuning, Networking, Per- TCP connection, a host will cache information for the formance connection for some defined period of time. If a new connection between the same 2 hosts is initiated before I. INTRODUCTION the cache has expired the entry, the connection will use the cached connection details to “seed” the connection’s The widely used FreeBSD UNIX-like operating sys- internal variables. This allows the connection to reach tem provides a mature, stable and customisable platform, optimal performance significantly faster, as TCP does suitable for many tasks including hosting network inten- not need to go through its usual steps of learning what sive applications and telecommunications research. The the optimal parameters for the connection are. operating system’s network stack plays one of the most The cached details include the Round Trip Time (RTT) important roles in internetworking, as it is the avenue estimate to the remote host, path Maximum Transmission through which all packets enter and exit a device on the Unit (MTU) slow start threshold, congestion window and network. bandwidth-delay product (BDP). The hc metrics struct Current TCP/IP stacks have evolved along with the found in <netinet/tcp hostcache.c> lists the full set of various incremental additional standards documents that connection details stored in the host cache. have appeared over the years, further increasing the com- The FreeBSD sysctl variables that affect the TCP host plexity of an already sophisticated part of the operating cache and their default values are shown in Listing 1. system. Most of the standards have been optional addi- tions to the protocol stack, and have been implemented as such in most operating systems. Listing 1 CAIA’s NewTCP project [1] mandated the use of the net.inet.tcp.hostcache.cachelimit: 15360 FreeBSD [2] operating system as the platform for per- net.inet.tcp.hostcache.hashsize: 512 forming our TCP research. As such, we performed some net.inet.tcp.hostcache.bucketlimit: 30 investigative research into how the FreeBSD TCP/IP net.inet.tcp.hostcache.count: 4 stack operates. This report aims to capture the informa- net.inet.tcp.hostcache.expire: 3600 tion learnt during this process for the benefit of future net.inet.tcp.hostcache.purge: 0 experimental TCP research using FreeBSD. Whilst FreeBSD 6.2-RELEASE was used for the net.inet.tcp.hostcache.cachelimit specifies the FreeBSD TCP/IP stack research we undertook, most of maximum number of cache entries that can be the technical information in this report will be applicable stored in the host cache. It is calculated as the to all FreeBSD 6.x releases, and possibly to earlier (5.x product of the net.inet.tcp.hostcache.hashsize and and 4.x to a lesser extent) or up and coming (7.x and net.inet.tcp.hostcache.bucketlimit variables. If the host beyond) releases. cache ever reaches capacity, it simply begins overwriting CAIA Technical Report 070717B July 2007 page 1 of 6 previous entries, which is not detrimental to the working window for the new connection. However, we discovered of the system at all. this does not occur correctly due to an incorrect assump- net.inet.tcp.hostcache.count lists the number of entries tion in the code, resulting in the congestion window currently in the host cache. always being set to 1 segment. The discussion on the net.inet.tcp.hostcache.expire lists the default time out FreeBSD “net” mailing list regarding this matter is value (in seconds) for cache entries. After initial creation, archived here [3]. The end result of this is that use a cache entry will expire net.inet.tcp.hostcache.expire of the host cache for seeding the initial congestion seconds after creation if a new connection to the same window actually degrades the performance of subsequent host is not made before the expiration. If a new con- connections between hosts, and obscures the intended nection is made, the corresponding cache entry’s expiry behaviour of RFC3390. It is therefore advised that timer is updated as well. changing the value of “TCP HOSTCACHE PRUNE” net.inet.tcp.hostcache.purge will force all current en- and net.inet.tcp.hostcache.expire as previously described tries in the host cache to be removed next time the cache be carried out to mitigate the quirk, in situations where is pruned. After the prune is completed the value will be this may be problematic e.g. experimental research. reset to 0. Note that setting the net.inet.tcp.hostcache.cachelimit In FreeBSD, the host cache is checked for stale tuner variable to 0 does not disable the host cache, and entries every 5 minutes by default. This value can be instead causes the kernel to panic whenever it goes to modified by changing the <netinet/tcp hostcache.c> add a cache entry. source file and recompiling the kernel. The “TCP HOSTCACHE PRUNE” define specifies the B. TCP Extensions period between checks for stale cache entries. A range of additional extensions to TCP have been Under normal circumstances, the default of 5 min- proposed since the initial TCP standard was released. utes is fine. However, when doing experimental re- RFC4614 [4] provides an excellent summary of these search, we ideally limit or remove the affect of vari- various proposals and references to the various Request ables other than the one being studied. Given that For Comment (RFC) documents that propose them. cached connection settings can alter the dynamics of a For the extensions supported by FreeBSD, the sysctl TCP connection, it is advisable to nullify the effects interface [5] provides a way of enabling or disabling the of the host cache. This can be easily achieved by extension and any applicable configuration variables. For changing “TCP HOSTCACHE PRUNE” to, for exam- the context of the following discussion unless explicitly ple, 5 seconds, recompiling the kernel, and then setting stated otherwise, enabling a sysctl variable means setting net.inet.tcp.hostcache.expire to 1. the variable to a value of “1”, and disabling means setting This will ensure all cache entries are removed within the variable to a value of “0”. For example, to enable 5 seconds of the cache entry being created at the termi- the TCP SACK extension, you would run the following nation of the connection. It is then simply a matter of command in a shell: ensuring you wait at least 5 seconds between test runs sysctl net.inet.tcp.sack.enable=1 to ensure the host cache is not affecting results. If this The most relevant FreeBSD sysctl variables that affect is too long a wait, you can decrease the time between the TCP stack and their default values are shown in checks even further. Listing 2. During our investigation of the affects of the host The full range of sysctl variables affecting FreeBSD’s cache on connections, we uncovered a quirk in the TCP stack can be found by running the following com- FreeBSD 6.2-RELEASE TCP stack. When a connection mand in a shell: is made to a new host (one for which no host cache sysctl -a | grep tcp entry exists), and the RFC3390 TCP configuration op- net.inet.tcp.sack.enable enables selective acknowledg- tion is enabled (see section II-B), the initial congestion ments [6], [7] and [8]. Enabling SACK does not guar- window should be, and is in fact set to 4380 bytes, as antee it will be used with all TCP connections, as it is recommended in RFC3390. However, when a host cache a negotiated option that both sides must support. It is entry does exist, the initial congestion window is set by advised that this variable be enabled. a different piece of code. net.inet.tcp.rfc1323 enables support for the window This piece of code is supposed to use the congestion scaling TCP extension [9]. This should be enabled to window value from the host cache to seed the congestion facilitate high bandwidth transfers. As with SACK, this CAIA Technical Report 070717B July 2007 page 2 of 6 Listing 2 reduced throughput as a result of delayed ACKing if net.inet.tcp.sack.enable: 1 non-zero packet loss is present. This behaviour occurs net.inet.tcp.rfc1323: 1 if the connection’s congestion window ever becomes net.inet.tcp.rfc3042: 1 equivalent to 1 segment size. This can occur if there is no net.inet.tcp.rfc3390: 1 data in flight after exiting fast recovery mode or during net.inet.tcp.slowstart flightsize: 1 initial startup of the connection. A congestion window of net.inet.tcp.local slowstart flightsize: 4 1 segment will result in the sending host sending only 1 net.inet.tcp.delayed ack: 1 segment, which will not be ACKed by the receiver using net.inet.tcp.delacktime: 100ms delayed ACKs until the delayed ACK timer expires.