The Importance of Jumbo Frames in Gigabit and 10- Gigabit Networks

www.small-tree.com AAbboouutt SSmmaallll TTrreeee CCoommmmuunniiccaattiioonnss

Copyright © Small Tree Communications 2004. All rights reserved

Small Tree Communications LLC was founded by a talented group of high performance networking and kernel engineers that recognized a unique opportunity in the Apple G5 platform.

We decided that rather than work for proprietary Unix vendors who are steadily losing market share, or arguing with open source Linux developers over whether it was important to have zero-copy networking for sockets, we found in Apple a company that's focused on its customers, provides the features they want, and has struck a balance between making their source available to the community while maintaining enough control to assure they could react to the business needs of their customers.

Small Tree Communications has created a range of solutions for your G5 and Xserve platforms including 10Gb , Dual port and 802.3ad . All designed so that you (our customers) can finally buy an enterprise server that works like you want it to work, is easy to use, and is kind of stylish too!

Small Tree Communications has offices in California, Minnesota and Wisconsin. You can find out more about us by visiting http://www.small-tree.com or dropping us a note at [email protected]! With today’s Ethernet components of 20 years ago. standard, computer networks can now That means these high-speed systems are transfer over 10 billion bits of still limited to exchanging data with no information every second. However, at more than 1500 byte chunks, which is these new high speeds, Ethernet’s far from optimal for today's servers and maximum size of 1500 bytes networks. becomes a serious problem and prevents applications and operating systems from Optimal Frame Size Optimal frame taking full advantage of this high size is a function of traffic, network and performance. To draw a comparison, a host characteristics. Bulk transfers and processor would have to handle roughly steady, intense traffic tend to benefit 1600 packets of data per second in order from larger frame sizes, as do high- to drive a 10Mbit Ethernet interface at bandwidth, high-reliability networks and line rate both sending and receiving. more powerful server platforms. With However to drive a 10Gb interface the proliferation of servers and the would require the CPU to handle 1.6 increasing distribution of applications million packets. over multiple collaborating server appliances, continuous transfers of large To mitigate this huge increase in the blocks of data between servers are on the number of packets arriving per second, rise. Add to this the overhead vendors have adopted a de facto standard fragmenting and reassembling data into known as “jumbo frames”. Typically, small packets and the cost to process these are 9000 bytes, although some each packet, and it's not surprising that vendors do support a larger size. most servers cannot keep up with the Using frames this large mitigates the new generation of high-speed LANs. number of packets that must be processed by a factor of 6. So rather than Subsequently, other high-speed processing 1.6 million packets per networking technologies have tended to second, a CPU need only process better reflect the needs of today's servers 266,666 packets. This reduces the and networks. FDDI, for instance, has a number of iterations through the stack, maximum transmission unit (MTU) of the number of protocol specific packets 4500 bytes. The default MTU for AAL5 (like ACKS and the like) and greatly over ATM is 9000 bytes, while Fiber reduces the amount of CPU cycles Channel and the High Performance required to drive the interface. Parallel Interface (HIPPI) typically use a maximum MTU of 65280 bytes Collaboration between compute and data (theoretically Fiber Channel’s MTU is servers or backup of server data can unlimited). result in huge data transfers, generating high network and CPU loads. The built But the popularity and installed base of in interfaces on today’s Apple these technologies pales in comparison workstations still keep to the Ethernet to Ethernet, putting users in a difficult frame sizes designed and optimized for spot. Ideally users would like to marry the relatively modest data transfer needs the best attributes from these niche of host computers and much slower technologies with standard Ethernet. Due to cost considerations and the desire interface card (NIC). In most older to more easily manage homogeneous Ethernet implementations each time a network topologies, designers favor packet is received by the adapter, it Ethernet for end system connections, interrupts the host to inform it that: 1) it despite the fact that many of today's has received a packet and 2) to stop what servers are hampered by a maximum it is doing and process the packet. Each size that is substantially of these interrupts consumes a less than optimal for their needs. This significant number of host processor creates a new opportunity to improve cycles. On a lightly loaded server, the server performance and network added burden on the processor might not by extending the maximum matter much. But on a heavily loaded frame size in high-speed Ethernet system, dramatic performance networks. improvements are seen when the processor is freed from this constant Application Performance Application stream of interrupts. This is particularly performance depends largely on server important on Gigabit Ethernet and 10 and network . Server Gigabit Ethernet networks where servers throughput is primarily a function of the may be receiving millions or even tens server’s processing power and load. of millions of packets per second. These Router and switch efficiency, in addition newer Ethernet standards implement to the amount of raw bandwidth various interrupt coalescing mechanisms available, directly drive network to reduce the total number of host throughput. interrupts, however jumbo frames alone will reduce the non-coalesced load by a Server Considerations In heavy traffic factor of 6. conditions, servers send and receive larger frames much more efficiently than Network Considerations Router and smaller ones. The increased efficiency switch efficiency is determined primarily results from the fact that it takes fewer by how much time they spend examining larger frames to transfer the same packet headers and determining how amount of data than with existing packets should be forwarded. Ethernet packets. As there is a significant amount of fixed processing Examining Headers Overhead for overhead per frame, processing overhead packet header parsing and making becomes proportional to the number of forwarding decisions is clearly frames presented to the system. proportional to the number of packets. Because routers examine many header Sending and Receiving Packets Much fields and make complex decisions, of the server overhead for transmitting a larger frames dramatically increase their packet is independent of the size of the efficiency. packet. For example, parsing and building the packet header takes the Headers Consume Network same amount of time for a large packet Bandwidth Headers are the same size as a small one. On the receive side, for all IP packets, whether big or small. fewer frames means fewer "packet Thus, headers consume proportionally received" interrupts from the network less network bandwidth within larger packets. Though headers are generally network. Right? Not necessarily. small (no more than about a hundred Ethernet error detection techniques put a bytes), they can consume a significant practical upper limit on frame size. percentage of network bandwidth Ethernet adapters, when transmitting particularly under heavy load conditions frames, insert a 32-bit Frame Check where millions or billions of small Sequence (FCS) into every packet. The packets are being transmitted. Therefore FCS is a number derived mathematically larger packets significantly reduce the from all the other bits in the frame. amount of raw network bandwidth being When the frame is received, the consumed. receiving adapter performs the same mathematical operation on the frame. If Performance Improvements The this operation does not yield the same benefits of large frame sizes on a busy four-byte number in the FCS header server had been demonstrated in several field, the frame has been corrupted in public performance tests. One such test transmission and is discarded. With showed a 50 percent reduction in server Ethernet, the FCS computation uses a CPU utilization when using 9018 byte- 32-bit cyclic redundancy check (CRC- sized Ethernet packets as opposed to 32). CRC-32 error checking detects bit 1518 byte frames while throughput errors with a very high probability. But increased by almost 50 percent from 409 as frame size increases, the probability Mbps to over 602 Mbps. This means not of undetected errors per frame may only increased throughput but also more increase. Due to the nature of the CRC- server cycles available to process 32 algorithm, the probability of applications. Though it’s difficult to undetected errors is the same for frame fully quantify CPU cycles savings sizes between 3007 and 91639 data bits associated with the use of extended (approximately 376 to 11455 bytes). frames, some approximations can be Thus to maintain the same bit error rate made. For a typical server accuracy as standard Ethernet, extended implementation, it takes approximately frame sizes should not exceed 11455 1,200 CPU cycles to process the IP and bytes. TCP headers of a single Ethernet frame (1,000 machine instructions times 1.2 In addition, an efficient frame size can CPU cycles per instruction). A 9018 be derived based on memory page sizes byte extended Ethernet frame can carry in host computers. Memory page sizes (4 the of six standard Ethernet Kbytes, 8 Kbytes, 16 Kbytes) used by frames with the overhead of only one the majority of commercial systems Ethernet frame. This saves the host from make multiples of 4Kbytes plus total processing five packet headers and header space (1200 bytes) attractive results in a savings of 6,000 CPU cycles frame sizes from the point of view of (at minimum). For a 10 MB file transfer, minimizing copy operations. this translates into a savings in excess of eight million CPU cycles. Lastly, an optimum frame size can be selected based on the block sizes used by How Large Should Frames Be? So, the the most popular applications. For larger the frame size, the better - given instance, a network file system (NFS) plenty of traffic and a very reliable is 8400 bytes. NFS is the most common file sharing protocol in Newer IP protocol stacks have per route networked environments. A 9018 Kbyte or per path MTUs which make it easier frame size is attractive by to deploy extended frames for a accommodating a single NFS datagram particular application. UDP protocol in one Ethernet packet and staying stacks may send up to 64 comfortably within the standard Ethernet Kbytes, as requested by the application. bit error rates. But there is no mechanism to negotiate UDP MTU with the peer station. If UDP Compatibility Issues A major concern datagrams that exceed the MTU of any in adopting extended frame sizes for intermediate network are sent, IP high-speed Ethernet is backward fragmentation will automatically occur. compatibility. Theoretically, IP protocol stacks and routing entities can usually be The Solution The most important issue configured to support MTUs of up to 64 in the extended Ethernet frame Kbytes. Applications running on TCP discussion is the ability to have seamless over IP should not have any problems operation between devices that support concerning MTU compatibility because larger frames and those that do not. the two end stations negotiate a common Extended Ethernet frames can be easily MTU when the TCP connection is implemented and controlled within established. The station with the larger legacy Ethernet environments by MTU "throttles back" and uses the MTU partitioning the reach of these frames, of the other station. either physically or through the use of standard tagging techniques such as The Pitfalls However, there may be IEEE 802.1Q. Since extended frame issues affecting intermediate IP routers. sizes yield the most benefit in large For instance, two end stations might transfers such as backup and data both support extended frames, but an replication, their application can intermediate IP network might not. In frequently be confined to a server farm that case, an IP router may have to or power workgroup. The simplest way fragment the packets. Fragmentation is to partition large frames is to build undesirable because it places an separate back-end server LANs or additional burden on the router and on workgroup LANs over which extended the receiving station, which has to frames flow freely among devices that reassemble the packets. Even worse, if support them. This is an expensive the DON'T_FRAGMENT bit is set in scheme as it requires a separate adapter the IP header of the packets, the router on each host. However, the adapter cost will drop the packets instead of often pales in comparison to the savings fragmenting them. The router usually in valuable server cycles and increased sends an ICMP DESTINATION productivity. A more cost-efficient way UNREACHABLE - to achieve the same result is via the use FRAGMENTATION NEEDED message of VLANs. The IEEE 802.1Q to the sending station in this case. This specification is an emerging standard for causes the station to reconfigure its IP tagging Ethernet frames with a VLAN protocol stack for a smaller MTU. In ID. Devices that implement such a this case, the sender has to "throttle scheme will support multiple VLANs or back" by sending smaller packets. IP subnets on a physical port. protocol, and stays comfortably within Integrating Extended Ethernet the standard Ethernet bit error checking Frames Into Existing Ethernet limits. By partitioning extended frames Infrastructures to application-specific VLANs or Using the 802.1Q mechanism, large physical LANs compatibility with legacy frames are tagged and partitioned in a equipment becomes a “non-issue.” VLAN in which all equipment (i.e. switches and adapters) support extended frame sizes. Compared to the physical partitioning method, this allows an Ethernet adapter to support both standard Ethernet frames and extended frames over the same physical link. This effectively eliminates interoperability problems resulting from forcing extended frames on devices that only support standard frames. There are other schemes which would allow transparent use of extended frame sizes for both new and legacy Ethernet equipment without requiring any partitioning schemes. Once available, they would allow for the seamless integration and operation of extended frames within traditional client-server networks regardless of device support.

Conclusion Higher speed networks are driving the need for more efficient packaging of data. With new 10 Gigabit- class networks comes the requirement to maximize the efficiency of the attached end systems. In heavily loaded networks or server LANs where continuous data transfer is required, current Ethernet frame sizes can actually degrade performance - negating many of the initial benefits of high-speed Gigabit networks. Extended frames significantly enhance the efficiency of Ethernet servers and networks by reducing host packet processing by the CPU and increasing end-to-end throughput. A 9018 maximum frame size fits well with server page sizes of 4Kbytes and 8 Kbytes, accommodates the NFS