Networking

Multicore Processing: and Syed Shah, Nikolay Guenov

Virtualization and Its Impact Virtualization helps to: Figure 1: Virtualization is a combination of • Run multiple operating systems on a Benefits of VirtualizationBenefits of Virtualization software and hardware features that single , including Windows®, creates virtual CPUs (vCPU) or virtual Linux® and more In-Service Software Upgrades systems-on-chip (vSoC). These vCPUs • Increase energy efficiency, reduce Dynamic Resource Allocation or vSoCs are generally referred to as hardware requirements and thereby virtual machines (VM). Each VM is reduce overall capital expenditure an abstraction of the physical SoC, Cost Flexibility • Determine highest availability OEM and Functional ConsolidationReduction Value Scalability complete with its view of the interfaces, and performance for enterprise Board-Level Consolidation memory and other resources within the applications Reliability physical SoC. Virtualization provides and • Use computing resources efficiently Protection the required level of isolation and Sandboxing Applications Multiple OSs in a Single SoC partitioning of resources to enable Virtualization in each VM. Each VM is protected from Embedded Space Virtualization and interference from another VM. The virtualization layer is generally called the Although they are not mainstream, these Multicore Processing monitor (VMM). IT industry trends are trickling down into With multicore SoCs, given enough the embedded space as well. Observed processing capacity and virtualization, Why Virtualization? trends for virtualization in the embedded control plane applications and data space include: plane applications can be run without Virtualization has already impacted the one affecting the other. Data plane server and IT industries in a significant • The concept of having a sea of and control plane applications, in most way. IT organizations are using it to processors, and the associated cases, will be mapped to different reduce power consumption and building processing capacity sliced and diced cores in the multicore SoC as shown in space, providing high availability for among applications and processes, is Figure 2. critical applications and streamlining no longer science fiction. application deployment and migration. • The challenges of extracting higher Control and data plane applications The trends to adopt virtualization in the utilization from the processors, and are not the only application- server space also are being driven by consolidation triggered by cost level consolidation that will occur. the desire to support multiple operating reduction, are driving the adoption of Virtualization and partitioning will allow systems and consolidation of services virtualization in embedded systems. OEMs to enable their customers to on a single server by defining multiple • With virtualization, one can merge the customize service offerings by adding VMs. Each VM operates as a stand- control and data plane processing their own applications and operating alone device. Because multiple VMs onto the same SoC. Previous systems to the base system on the can run on a single server (provided the approaches used separate discrete same SoC, rather than using another server has enough processing capacity), devices for these functions. discrete processor to handle it. Data IT gains the advantage of reduced or control traffic that is relevant to the Virtualization offers three major benefits server inventory and better server customized application and operating to the embedded industry: utilization. system (OS) can be directed to the 1. Cost reduction via consolidation appropriate virtualized core without 2. Flexibility and scalability impacting or compromising the rest of 3. Reliability and protection the system.

24 Beyond Bits Power Architecture Edition

Figure 2: Control and Data Plane Application 1) OS-hosted and 2) bare-metal . Each approach has its pros ControlConsolidation and Data Plane inApplication a Virtualized Multicore SoC and cons and the choice would depend Data Plane Processing Control Plane Processing SoC SoC on the applications and the market

µp µp µp µp segments. OS-Hosted Hypervisor

Data Plane Traffic Switch Control Plane Traffic The hypervisor schedules the guest operating systems based on the Mixed Control and Data Plane Traffic Application scheduling policies in the hypervisor. Consolidation Scheduling of the VMM and guest Data Plane Control Plane operating systems is dependent on µp µp µp µp Single Multicore SoC Partitioned and Virtualized the scheduling policies of the host OS,

Mixed Control and because they run on top of the host OS. Data Plane Traffic Switch

Mixed Control and Figure 3: Data Plane Traffic OS-hostedOS-Hosted Hypervisor Hypervisor

Another example of consolidation of Addressing Challenges with Virtual Virtual functions is board-level consolidation. Virtualization Machine 1 Machine 2 Functions that were previously Addressing Partitioning Challenge implemented on different boards now Partitioning can be defined as Hypervisor can be consolidated onto a single card subdividing resources of a SoC in a and a single multicore SoC. Virtualization Device Host OS manner that allows the partitioned Drivers can present different virtual SoCs to the resources to operate independently of applications. With increasing SoC and one another. Partitioned resources can Hardware application complexity, the probability be mapped either explicitly to the actual of failures due to software bugs and hardware or to the virtualized hardware. SoC mis-configuration are greater than Note that the system can be partitioned purely hardware-based failures. In Bare-Metal Hypervisor without being virtualized. For example, such a paradigm, it may make sense The bare-metal hypervisor approach in a SoC that allows partitioning but not to consolidate application-level fault does not depend on the host OS and virtualization, each Ethernet interface can tolerance onto a single multicore SoC, runs directly on the physical hardware be assigned to a partition but a single where a fraction of the cores are set (bare metal). The hypervisor fully Ethernet interface cannot be assigned aside in hot standby mode. While such controls the SoC, enabling it to provide to two different partitions at same time. a scheme will save the cost of having quality of service guarantees to the However, if the SoC also provides to develop a standby board or at the guest operating systems. virtualization capabilities, then a single very least another SoC, it would require Ethernet interface can be virtualized and the SoC to be able to virtualize not only Figure 4: each virtual Ethernet interface can be the core complex but also the inputs/ Bare-Metal Hypervisor presented to a different partition. Bare-Metal Hypervisor outputs (I/Os).

Although virtualization has its Addressing Fair Sharing and Virtual Virtual advantages, it comes with new Protection Challenge Machine 1 Machine 2 challenges and considerations, including A hypervisor is a software program partitioning, fair sharing and protection that works in concert with hardware Guest OS 1 Guest OS 2 of resources between multiple/ virtualization features to present the VM Device Hypervisor competing applications and operating to the guest OS. It is the hypervisor that Drivers systems. The following sections will creates the virtualization layer. There discuss how virtualization technology are two broad architectural approaches Hardware addresses these challenges. when it comes to virtualizing the system:

freescale.com 25 Networking

Different I/O handling approaches can Advantages of Virtualization Based on the system and the be used by the hypervisor, including fully application, memory associated with On the processor, a new hypervisor virtualized I/O, dedicated I/O and para- a VM may need to be protected and state introduced by Freescale and some virtualized I/O. In the fully virtualized partitioned from the other VMs. For traditional compute-centric companies approach, the hypervisor virtualizes example, in a mixed control and data automatically traps privileged and the I/O by emulating the devices in plane system where control and data sensitive calls by the guest OS to the software. The software overhead plane applications run on the same hypervisor, removing the need for binary thus created reduces the efficiency SoC, it is imperative that memory code rewriting or para-virtualization. This of the system. In the para-virtualized be protected and partitioned so that reduces the complexity and overhead approach, the I/O interfaces are not fully these applications run completely introduced by the hypervisor and also virtualized. independently and the state of one improves overall performance of the partition cannot erroneously or The key difference between full I/O system. maliciously corrupt the state of the other virtualization and para-virtualization is The second area where hardware partition. that not all functions are emulated in support improves hypervisor para-virtualization; hence, this approach Finally, the last major area where performance is memory management. reduces software overhead at the cost hardware support for virtualization will In an un-virtualized system, a virtual of OS portability. In the dedicated greatly reduce hypervisor overhead memory address is formed by using the approach, each VM is assigned a is the I/O. With increasing desire to effective address. The virtual address dedicated I/O in its own partition consolidate applications and workloads is then translated to the physical and does not have to go through the onto single multicore SoCs, the density address by the memory management hypervisor for I/O transactions once of VMs per SoC is bound to increase. unit. The system uses the physical set up, resulting in the lowest software However, this trend can have significant address to either fetch or store the overhead. performance impact as more and more data or instruction from memory. If the applications become bound by network Unlike servers or compute-centric system has multiple processes running I/O. In traditional virtualized systems, systems, one key design metric for on it, another level of abstraction is the hypervisor manages I/O activity embedded systems is performance introduced, and the virtual address of different VMs by using a virtual of the system per watt of power is calculated using the effective switch. The virtual switch resides inside dissipation. That is, the system should address and the process ID. That is, the hypervisor and is responsible for be optimized to extract the best different processes having the same negotiating traffic exchange between possible performance within a given effective address will be represented the VMs and the I/Os. The virtual switch power budget. Usually the power by a different virtual address and parses and classifies incoming packets budget of embedded systems is more consequently a different physical and forwards them to the appropriate constrained than that of the servers address. VM. The virtual switch does the same or compute-centric systems. While A virtualized system can have many for the packets it receives from the VMs, portability and flexibility are important, VMs running on a single system and and forwards them to the appropriate often they are not the number one each VM can have multiple processes network I/O. As the number of VMs concern. As such, the bare-metal running on it. Memory is allocated to increase and network I/O speeds hypervisor approach offers the best each VM and in turn to the different move from 1 Gigabit Ethernet (GbE) virtualization solution for embedded processes running on each VM. A to 10 GbE, the number of CPU cycles systems. virtualized system, therefore, introduces required by the virtual switch to forward In summary, while the OS-hosted yet another level of memory abstraction. packets to and from the VMs and the approach offers the greatest application While this abstraction can be I/O will increase substantially—reducing and guest OS portability, the bare- accomplished using software, hardware the amount of CPU cycles available metal hypervisor approach offers the assistance will greatly improve the for the applications running on the best performance and the lowest performance of the system. Freescale VMs. Hardware support that eliminates virtualization overhead. and some compute-centric companies the need for the virtual switch, or have introduced enhancements to their significantly reduces the burden on it, memory management units that allow will inevitably improve the performance multiple levels of memory abstractions. of the system.

26 Beyond Bits Power Architecture Edition

Multicore Processors in New Freescale Enterprise Key Features of Data CenterFigure Solution 5: Key Features of Freescale’s Generation Data Center Networking and Data Center Data Center Solution Solutions Strategy Figure 5 shows key aspects of Freescale’s data center approach CoreNet Multicore SoC Freescale’s data center solution based provides embedded solutions for: Hardware High Speed I/O Hypervisor Broad SoC Convergence on virtualization. SMP/AMP OS Portfolio • Switching platforms • The CoreNet, hardware hypervisor • Storage platforms

irtualization and SMP/AMP OS technologies V QorIQ • Application delivery controllers of QorIQ processors help enable • Intelligent network interface controllers virtualization required for data center (NICs) and converged network networking. adapter (CNAs) Increased Bandwidth • With its multicore architecture, • Low power servers 10G today 40G Roadmap high-speed I/O and broad SoC portfolio, QorIQ processors facilitate Freescale is uniquely positioned to Freescale’s Virtualization convergence. enable common platforms through Solution • With the progressively increasing a broad portfolio of integrated SoC Discrete hardware-based solutions have speed and processing capacities solutions for control and data path been proposed by some companies in of QorIQ processors, Freescale is processing. Scalable multicore compute-centric industry. Freescale, on targeting 40G standards from the processor solutions facilitate common the other hand, has taken an integrated current mark of 10G today. platform architectures across OEM portfolios, maximizing OEM hardware data path architecture approach for Key differentiating features of Freescale’s and software investments with best- the embedded market in its QorIQ data center networking solution are: products. QorIQ processors provide in-class tools and software ecosystem • Data centers focused on reducing an integrated on-chip solution using including commercial solutions from power and cost a combination of in-line hardware Freescale VortiQa and third-party accelerators to parse, classify and • Network node consolidation driving partners. queue packets to different VMs and multiple functions into fewer platforms processors. Traffic from multiple network • Integrated service routers adding interfaces can be directed to different appliance capabilities partitions and VMs on the systems. • Management of multiple devices is QorIQ processors also offer built-in usually difficult and time consuming scheduling and priority mechanisms for the system to fairly distribute traffic Figure 6: DataFreescale’s Centers Enterprise Networking and among different partitions and VMs, as Data Center Strategy well as to allow policy-based sharing of the network I/Os by different VMs.

In a nutshell, Freescale’s approach is Network Storage Attached to allow partitioning and virtualization Arrays Storage of SoC, provide performance isolation Blade Data Data Path between different partitions and Servers Center Acceleration and ToR, EoR virtualized SoC and manage and protect Emerging and Core Low Power those resources. Switches Security Servers Scalable and Multicore Virtualization

Application Modular Delivery Switches Best-in-Class Networking Performance/Watt

Host Bus Fixed and Converged Configuration Network Adaptors, Switches Intelligent NICs

freescale.com 27 Networking

Table 1: Freescale’s Enterprise Networking and Data Center Product Offerings Main Functions New Technologies Freescale Offering Network • IP switching • High bandwidth and throughput • MPC83xx, MPC85xx, P10xx, P20xx, • WAN optimization virtualization P30xx, P4080, P5020 • Application delivery Security • Security/UTM • IDS/IPS • MPC8572, P1020, P2020 • Appliance • High-bandwidth SSL • High-bandwidth firewall Storage • SAN • Virtualization of resources • P3041, P4080, P5020 • Storage controllers • On-demand provisioning • NAS • Improved performance Computing • Server controllers • High bandwidth • MPC85xx -> P10xx -> P2040 • Virtualization • P3041 -> P4080 • P5020 Application • Server processors • Large multicore complex cluster cores • P4080

In the current era of computing, Conclusion per watt, so it is desirable to offload as there has been increased demand to many functions to hardware as possible Proliferation of multicore processors have a more robust and secure data and free CPU cycles to be allocated in embedded markets and the desire center. Understandably, one of the to applications. When it comes to to consolidate applications and primary concerns that a company has virtualization, the same philosophy is functionality will push the embedded while implementing a data center is to applied. While software-based solutions industry into embracing virtualization sustain business continuity. Because would work fine in the server and in much the same way as it occurred every company has a tremendous compute-centric markets, they are more with the server and compute-centric reliance on its IT operations and likely to degrade the performance of markets. because many of these IT operations an embedded system to unacceptable rely on data centers, it is extremely vital Differences in the characteristics of levels. In embedded markets, the for these data centers to be available embedded and compute-centric bare-metal hypervisor-based approach all the time throughout the year. With markets warrant different virtualization coupled with hardware virtualization its broad range of current and future approaches. The embedded market, assists in the core, the memory QorIQ and PowerQUICC III processors unlike the server and compute- subsystem and the I/O appears to offer and cutting edge technologies such centric markets, is sensitive to the the greatest performance over other as high bandwidth and throughput, power envelope a particular device approaches. virtualization, high bandwidth SSL and can dissipate and usually has very on-demand provisioning for improved constrained power budgets as performance, Freescale has the compared to those in the compute- capacity to meet the requirements of centric space. One of the primary the most robust data centers (see design objectives in the embedded Table 1). market is to maximize performance

28 Beyond Bits Power Architecture Edition

How to Reach Us:

Home Page: Information in this document is provided solely to enable system and software implementers to freescale.com use Freescale Semiconductor products. There are no express or implied copyright license granted hereunder to design or fabricate any integrated circuits or integrated circuits based on the information Power Architecture in this document. Portfolio Information: Freescale Semiconductor reserves the right to make changes without further notice to any products freescale.com/power herein. Freescale Semiconductor makes no warranty, representation or guarantee regarding the e-mail: suitability of its products for any particular purpose, nor does Freescale Semiconductor assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any [email protected] and all liability, including without limitation consequential or incidental damages. “Typical” parameters which may be provided in Freescale Semiconductor data sheets and/or specifications can and do USA/Europe or Locations Not Listed: vary in different applications and actual performance may vary over time. All operating parameters, Freescale Semiconductor including “Typicals” must be validated for each customer application by customer’s technical experts. Technical Information Center, CH370 Freescale Semiconductor does not convey any license under its patent rights nor the rights of others. 1300 N. Alma School Road Freescale Semiconductor products are not designed, intended, or authorized for use as components Chandler, Arizona 85224 1-800-521-6274 in systems intended for surgical implant into the body, or other applications intended to support or 480-768-2130 sustain life, or for any other application in which the failure of the Freescale Semiconductor product [email protected] could create a situation where personal injury or death may occur. Should Buyer purchase or use Freescale Semiconductor products for any such unintended or unauthorized application, Buyer shall Europe, Middle East, and Africa: indemnify and hold Freescale Semiconductor and its officers, employees, subsidiaries, affiliates, and Freescale Halbleiter Deutschland GmbH distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney Technical Information Center fees arising out of, directly or indirectly, any claim of personal injury or death associated with such Schatzbogen 7 unintended or unauthorized use, even if such claim alleges that Freescale Semiconductor was 81829 Muenchen, Germany negligent regarding the design or manufacture of the part. +44 1296 380 456 (English) +46 8 52200080 (English) +49 89 92103 559 (German) +33 1 69 35 48 48 (French) [email protected]

Japan: Freescale Semiconductor Japan Ltd. Headquarters ARCO Tower 15F 1-8-1, Shimo-Meguro, Meguro-ku, Tokyo 153-0064, Japan 0120 191014 +81 3 5437 9125 [email protected]

Asia/Pacific: Freescale Semiconductor Hong Kong Ltd. Technical Information Center 2 Dai King Street Tai Po Industrial Estate, Tai Po, N.T., Hong Kong +800 2666 8080 [email protected]

For Literature Requests Only: Freescale Semiconductor Literature Distribution Center P.O. 5405 Denver, Colorado 80217 1-800-441-2447 303-675-2140 Fax: 303-675 2150 [email protected]

For more information, visit freescale.com/power

Freescale, the Freescale logo, PowerQUICC and QorIQ are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. CoreNet is a trademark of Freescale Semiconductor, Inc. The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks licensed by Power.org. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Document Number: PWRARBYNDBITSMP REV 0