Impressive Packet Processing Performance Enables Greater Workload Consolidation Single Intel® Xeon® Processor Platform Achieves Over 80 Mpps Throughput1
Total Page:16
File Type:pdf, Size:1020Kb
Solution Brief Packet Processing on Intel® Architecture Impressive Packet Processing Performance Enables Greater Workload Consolidation Single Intel® Xeon® processor platform achieves Over 80 Mpps throughput1 With Intel® processors, it’s possible support teams, and faster time-to- to transition from using discrete market with a solution benefiting architectures per major workload from the economies of scale of the (application, control plane, packet server industry. Such a notable level and signal processing) to a single of consolidation is achievable through architecture that consolidates the industry-leading performance, I/O workloads into a more scalable and throughput and performance per watt simplified solution. This software- density – all on a standards-based based approach, depicted in Figure 1, platform. Solution providers listed in yields further benefits with Intel’s 4:1 this paper are ready to help equipment workload consolidation strategy. The architects jumpstart their next designs. hardware platform – based on general- purpose server technology – has been optimized using the best practices of 2011 2012 2013 the communications industry to ensure system robustness with core, memory Application and I/O scalability and performance Processing Intel® Xeon® Intel® Xeon® Next improvements, to meet network Processor Processor Generation operator’s low to high-end system C3500/C5500 E5-2600 series requirements. Control Processing Intel® QuickAssist This framework is made possible by Technology the high performance level of Intel Software Library processors plus the Intel® Data Plane Packet Development Kit (Intel® DPDK), which NPU/ASIC Processing greatly boosts packet processing Intel® Data Plane Development Kit performance and throughput, allowing more time for data plane applications. Signal As a result, telecom and network DSP DSP Processing Intel® Signal Processing equipment manufacturers (TEMs/ Development Kit NEMs) can take advantage of lower development costs, fewer tools and Figure 1. Intel’s 4:1 Workload Consolidation Strategy This approach also reinforces What’s Changed? disabling interrupts generated by the synergies resulting from the packet I/O, using cache alignment, The Intel® Xeon® processor E5-2600 convergence of telecom and data implementing huge pages to reduce series, with an integrated DDR3 networks by blending the best of translation lookaside buffer (TLB) memory controller and an integrated communications and computing misses, prefetching, new instructions PCI Express* controller, was a game- competencies. What do both industries and many other concepts. The Intel changer for delivering the small bring to the party in terms of DPDK also runs in user space, thus packet throughput required for next technology and capability? removing the high overhead associated generation networks. This resulted with kernel operations and with The communications industry excels in lower memory latency, decreasing copying data between kernel and user in hardware density, high speed from around 120 nanoseconds (ns) memory space. pipes, low latency transmission, to about 70ns, on par with the arrival robustness, exceptional quality of rate of 64 byte packets. Still, there’s Taking advantage of the breakthroughs service (QoS), determinism and real- far more to do than just store packets enabled by the Intel DPDK, today’s time I/O switching. On the computing to memory, and that’s where the Intel Intel® architecture-based platforms side, there are standards-based DPDK makes a tremendous difference. based on a single Intel Xeon processor technologies that enable dynamic The development kit reduces a E5-2600 series are achieving over 80 resource sharing, workload migration, significant amount of overhead when million packets per second (Mpps) of security, open APIs, developer using an out-of-the-box, standard L3 forwarding throughput for 64 byte communities, virtualization and power Linux* operating system. Significant packets (Figure 2).1 management. time is saved by using core affinity, 80 Mpps † Intel® DPDK Linux User Space 35.2 Mpps † Intel® Data Plane Development Kit (Intel® DPDK) 12.2 Mpps † Linux User Native Linux* Space Stack 64 Byte Throughput Intel® Xeon® Intel® Xeon® Next generation Processor E5645 Processor E5645 Intel® Processor 2 Sockets 1 Socket 1 Socket (6 x 2.4 GHz cores) (6 x 2.4 GHz cores) (8 x 2.0 GHz cores) † Measured performance. IO Performance may vary with platform configuration †† Estimates are based on internal analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Figure 2. Breakthrough Data Plane Performance with Intel® Data Plane Development Kit (Intel® DPDK) L3 Packet Forwarding 2 Higher Packet Processing packet (64 byte) high performance Performance applications. It offers a simple software After testing how many programming model that scales from Intel architecture has a proven track 10 gigabyte pipes could be Intel® Atom™ processors to the latest record of delivering high performance fully utilized using the Intel® Intel® Xeon® processors, providing and innovation over its long history Data Plane Development Kit flexible system configurations to of meeting requirements of various (Intel® DPDK), John Basco, meet any customer requirements embedded communication markets. By Distinguished Engineer at for performance and scalable I/O. integrating a memory controller and Verisign, couldn’t believe his While the Intel DPDK may be used in increasing memory bandwidth, the first performance measurements. various environments, the Linux SMP generation Intel® Core™ i7 processor So, he went to some of his user space environment is the most trusted colleagues and said, family achieved breakthrough common among engineers due to “You’re going to think I’m performance when executing control the ease of development and testing crazy, but I’m getting these and data plane workloads concurrently. of customer applications, along with performance results. There This was possible, in large part, to the maintaining leading edge data packet must be a decimal point in the Intel DPDK, which greatly improved performance that easily outperforms wrong place.” packet processing on Intel architecture. native Linux, as shown in Figure 2. The “They did their own evaluations, The Intel DPDK, consisting of a set L3 forwarding for the 2.40 GHz Intel® and it turned out the numbers of libraries designed for high speed Xeon® processor E5645 on a native were right. The Intel DPDK is data packet networking, is based on Linux stack is about 1 Mpps per core, really what I would consider a simple embedded system concepts and compared to 6 Mpps (64 byte packets) disruptive technology.” allows users to build outstanding small per core using the Intel DPDK. Introduction of 180 Introduction of integrated PCI 160 integrated memory Express* Controller 140 controller and Intel® Data Plane 120 Development Kit (Intel® DPDK) 100 Mpps 80 60 40 20 0 2006 2007 2008 2009 2010 2011 2012 Low Voltage Intel® Xeon® processor Intel® Xeon® processor Intel® Xeon® Intel® Xeon® processor Next generation Next generation Intel® Xeon® processors 5300 Series 5400 Series processor 5540 E5645 Intel® processor Intel processor 2 Sockets 2 Sockets 2 Sockets 2 Sockets 2 Sockets 1 Socket 2 Sockets (2x2.00 GHz cores) (4x2.33 GHz cores) (4x2.33 GHz cores) (4x2.53 GHz cores) (6x2.40 GHz cores) (8x2.00 GHz cores) (8x2.00 GHz cores) Figure 3. IPv4 Layer 3 Forwarding Performance for Various Generations of Intel Architecture Processor-based Platforms 3 With the addition of integrated PCI L3 Forwarding Performance Express controllers, more cores and (Packets/sec) architecture enhancements, the 2nd Generation Intel® Core™ Processor 90 Family provides even greater 80 scalability and flexibility to embedded 70 communication market segments. 60 The Intel DPDK demonstrates the impressive small packet performance 50 achievable using the latest Intel 40 architecture in Figures 2 and 3. See the 30 Appendix for platform information. 20 It should be noted in the case of Packets/second Millions of 10 IPv4 Layer 3 forwarding, the system 0 throughput is I/O limited, constrained by the current PCI Express Generation 64 128 256 512 768 1024 1280 1518 2 network interface cards (NICs); in Packet Size (bytes) other words, the processor’s computing 2 Ports (1C/2T) 4 Ports (2C/4T) 8 Ports (4C/8T) 8 Ports (8C/8T) capacity was not fully utilized. Figure Figure 4. Eight Core Intel® Xeon® Processor E5-2600, L3 Forwarding Performance 4 shows the packet performance for various packet sizes and port usage power management and security. It cases for a single eight core 2.0 GHz is now possible to consolidate packet Intel Xeon processor E5-2600 series processing and network services onto “Telecom equipment vendors with 20MB L3 cache and with Intel® are trying to do more with a single processor architecture, which Hyper-Threading Technology (Intel® HT less R&D spending, and saves hardware and operating cost, Technology) disabled. it’s important for them to eases application development and This impressive performance makes move towards a common improves agility. platform that runs multiple Intel architecture well suited to applications. The dramatic support a wide range of software- packet processing boost from based networking components that Development Kit Overview the Intel DPDK enables our perform routing, firewall, VPN and The Intel DPDK provides Intel customers to converge on a IPS, just to name