ENEA Multicore: High Performance Packet Processing Enabled with a Hybrid SMP/AMP OS Technology
Total Page:16
File Type:pdf, Size:1020Kb
WHITE PAPER 1 ENEA MULticore: High performance packet processing enabled with a hybrid SMP/AMP OS technology Patrik Strömblad, Chief System Architect OSE, Enea The multicore technology introduces a great challenge to the entire software industry, and it causes a significant disruption to the embedded device market since it forces much of the existing software to be redesigned according to new, not yet established principles. Abstract distribution scalability to multicore de- IP packets. In most cases, network Moore’s law does still hold, but processor vices has proven to be a fairly straight- traffic processing can be divided in two vendors are rapidly turning to use forward task as it preserves the existing major categories; slow path processing the additional transistors to create software architecture investments. and fast path processing: more cores on the same die instead n Slow path processing: This involves of increasing frequency since this not Introduction network protocol control signaling only gives more chip performance but The embedded software industry is to configure and establish the data also decreases the power consumption facing the challenge to really start to paths, and it also involves packets (watt/mips). think and design in parallel, in order that terminate on this node in This paper starts by describing to be able to fully make use of the higher protocol layers that uses the the widely accepted multiprocessing new multi-core devices. Up until now, socket interface. software design models, and some we have been able to gain higher n Fast path processing: This involves of their benefits and drawbacks. After performance solely by upgrading the IP forwarding and other kinds of that, a few simple packet processing processor, but there is no “free lunch” intermediate processing of the use cases are described, aiming to il- any more. This rises a great challenge actual data packets such as NAT, luminate the pain points of a strict AMP also on the OS side, as the requirement security, etc. It is critical to minimize multiprocessing approach. Finally, we to parallelize also applies within the overhead in fast path processing introduce Enea’s OSE® Multicore Edition OS. The goal for a multicore RTOS must since the I/O bandwidth is dominant (Enea OSE MCE), and show how this be to provide excellent support for the and the CPU cycle budget for soft- new hybrid SMP/AMP RTOS technology application to maximize its performance ware execution is limited. can provide a homogeneous, scalable and scale with more cores, and at the and portable application framework same time maintain standard RTOS real- IP forwarding packets may still be part for high-speed packet processing time characteristics such as determinism of the slow path if it, for any reason, applications within the data- and and interrupt latency. The RTOS must need more cpu processing budget than connectivity layer, while at the same provide a simple, flexible and uniform allowed in the fast path. The categori- time being a feature-rich SMP RTOS for programming environment that offers zation of slow and fast path is a way to networking control protocols. Enea OSE capabilities such as load balancing, boot categorize the cost of processing rather MCE defines a very low-level multicore loading, file system and networking. than the functionality. processor abstraction model which The ongoing convergence between gives high portability and at the same the telecom and datacom domain are time it provides a low-overhead device creating extreme requirements on the programming model. The migration nodes in the networks to be able to of Enea OSE applications designed for handle very high bandwidth of primarily Enea is a global software and services company focused on solutions for communication-driven products. With 40 years of experience Enea is a world leader in the development of software platforms with extreme demands on high-availability and performance. Enea’s expertise in real-time operating systems and high availability middleware shortens development cycles, brings down product costs and increases system reliability. Enea’s vertical solutions cover telecom handsets and infrastructure, medtech, industrial automation, automotive and mil/aero. Enea has 750 employees and is listed on Nasdaq OMX Nordic Exchange Stockholm AB. For more information please visit enea.com or contact us at [email protected]. www.enea.com WHITE PAPER 2 The slow path is not subject for discus- like Linux and Windows provides a is that the OS provides no support to sion in this paper since TCP/IP stacks are best-effort execution platform for these the distributed application for load generally not multicore-adapted. The kinds of CPU-intensive applications. balancing or OS resource management. fast path category is the area which has The high degree of hardware The configuration, load and startup of the highest bandwidth increase, and resource abstraction is in many cases an such an application is also inherently therefore many people seek ways to advantage, but the layer introduces complex to design. offload these features from the legacy substantial overhead when the appli- stacks and create new design models cation becomes as I/O intensive as ”Bare Metal” Model for them on multicore devices. they tend to be in embedded packet The ”Bare Metal” model is a single The next section describes the forwarding/routing applications. The threaded execution model where the common multiprocessing variants that principles of the shared memory available API:s are processor-vendor are subject for evaluation in discussions programming model on the application specific. Since no regular RTOS exist for around multicore software solutions. level and inside the Linux kernel is these threads, a common approach is based on using mutable shared objects to run a regular operating system on Various Multicore Processing in memory, and this is an inherent one or several cores, like Linux, and let Models in SW system design bottleneck to scalability in multicore the rest of the cores execute a “bare- There are fundamentally three multi- systems. This will inevitably lead to poor metal” thread and use an application processing models that are used to scaling to many cores. framework that creates an abstraction describe system designs on multicore This, together with the fact that the of the hardware layer. The advantage devices; the SMP, AMP and the bare complex SMP implementation of kernels here is of course that maximal perform- metal model. These models have a in many cases has the drawback of not ance and minimal overhead is achieved number of benefits and drawbacks, being deterministic, makes the classic SMP when running without an RTOS, but which will be briefly described below. OS:es less suitable as a RTOS for high-speed the disadvantage is though that the packet processing in the long run. software becomes hardware-specific, The SMP Model which will force a redesign of any The Symmetric Multi Processing model The AMP Model applications whenever the hardware is is the model used in the design of The Asymmetric Multi Processing model upgraded. Also, the parts of the system several enterprise OS:es such as Linux uses an approach where each core is running without an RTOS or application as well as in the design of its application running its own complete, isolated, framework will take on the role of a domain. In such OS:es and their appli- operating system or application frame- black box, i.e. there will be no observ- cations data is to a large extent shared, work (an alternative term for a more ability except for the external interfaces. and a number of different locking light-weight RTOS). This leaves the door Any support for tracing, debug or post mechanisms and atomic operations open to also choose to have different mortem dump support is not available, are used frequently for synchronization. RTOS:es on different cores. The advantage and therefore the amount of code “out The SMP model is easy to manage from of an AMP system is that high-perform- there” must therefore naturally be kept a SW management perspective since ance is achieved locally and that it to a minimum. Over the time though, it creates a good abstraction where scales well to several cores. Using the the need for more functionality in these the OS facilitates best-effort cpu load AMP model and virtualization technique parts will most likely grow which in turn balancing, and it has been used in the is also a way to being able to reuse increases the need for better device server and desktop application domain legacy single core designs. abstraction. for a very long time. Enterprise OS:es The drawback with the AMP model Case study: Packet processing use cases using an AMP system design The AMP model is not new to the embedded world. It has been deployed in many systems where DSP:s or network processors are put together as multi- core clusters, dedicated to perform a specific task. Some examples of use cases applic- able to the AMP model: Functional pipelining through cores Parallel, symmetric processing on cores Packet Processing Functions Figure 1. AMP processing models. WHITE PAPER 3 n A set of Digital Signal Processors In many cases these flows has states OS provides no support for state (DSPs) that perform signal proces- (e.g. multimedia transcoding, IP tunnel- migration between cores. Adding sing like transcoding on flows of ing), which means that all packets part support for state check-pointing, packets. of a flow must be “pinned” and passed thread migration and balancing of n A set of network processors that in-order through the same sequence of packet flows can be quite complex. performs IP fast path processing.