QNX Software Systems Ltd. 175 Terence Matthews Crescent Ottawa, Ontario, Canada, K2M 1W8 Voice: +1 613 591-0931 1 800 676-0566 Fax: +1 613 591-3579 Email: [email protected] Web: www.qnx.com

Making the Switch to RapidIO

Using a Message-passing Microkernel OS to Realize the Full Potential of the RapidIO Interconnect

Paul N. Leroux Technology Analyst QNX Software Systems Ltd. [email protected]

Introduction

Manufacturers of networking equipment have hit a bottleneck. On the one hand, they can now move traffic from one network element to another at phenomenal speeds, using line cards that transmit data at 10 Gigabits per second or higher. But, once inside the box, data moves between boards, processors, and peripherals at a much slower clip: typically a few hundred megabits per second.

To break this bottleneck, equipment manufacturers are seeking a new, high-speed — and broadly supported — interconnect. In fact, many have already set their sights on RapidIO, an open-standard interconnect developed by the RapidIO Trade Association and designed for both chip-to-chip and board-to-board communications. Why RapidIO? Because it offers low latency and extremely high bandwidth, as well as a and a small silicon footprint — a RapidIO interface can easily fit Making the Switch to RapidIO

into the corner of a processor, FPGA, or ASIC. Locked Out by Default: The RapidIO is also transparent to software, allowing Problem with Conventional any type of data protocol to run over the intercon- Software Architectures nect. And, last but not least, RapidIO addresses the demand for reliability by offering built-in While RapidIO provides a hardware that is error recovery mechanisms and a point-to-point both fast and reliable, system designers must find or architecture that helps eliminate single points develop software that can fully realize the benefits of failure. of its advanced features — especially its support for Of course, other potential interconnect standards distributed, multiprocessor systems. The problem isn’t exist, including HyperTransport, Infiniband, Fibre with RapidIO itself, which is generally software- Channel, Gigabit/10G , and PCI-based fabric transparent, but with software that can’t easily migrate interconnects. But RapidIO doesn’t seek to supplant to high-speed distributed designs. Sometimes, the any of these. Rather, it is complementary, offering problem lies with application software that is written high performance — up to 64 Gbit/s per port — over to be hardware- or protocol-specific. All too often, the short distances needed to connect processors, however, the problem lies not with the application, memory, and local I/O within a single system. but with its underlying operating system (OS).

For instance, consider what happens whenever an An Emerging Standard application needs to access a system service, such RapidIO is gaining momentum. Already, Motorola as a device driver or protocol stack. In most OSs, and IBM have begun to implement RapidIO inter- such services run in the kernel and, as a result, must faces on next-generation processors. Meanwhile, be accessed by OS kernel calls. Since kernel calls Altera has released a RapidIO solution for program- don’t cross processor boundaries, these services can mable systems-on-a-chip, and Tundra Semiconductor be accessed only by applications on the local node. has announced both switch chips and host bridge It thus becomes difficult, if not impossible, to distri- chips, including PCI/PCI-X to RapidIO bridge chips bute an application and its related services across for migration from existing designs. RapidIO has multiple nodes — they’re effectively locked together also gained acceptance from leading vendors such on the same processor. as Alcatel, Cisco, Ericsson, Lucent, Nokia, and Nortel — in fact, most of these companies are active members of the RapidIO trade association. Distributed by Design: The Advantage of Microkernel RapidIO itself continues to mature. Just recently, the RapidIO Trade Association released a serial Architecture specification that, combined with the existing To make distributed systems much simpler to RapidIO parallel specification, allows systems implement, the QNX® Neutrino® RTOS uses a designers to standardize on a single interconnect message-passing microkernel architecture. In a technology. For example, semiconductor and ASIC microkernel OS, only the most fundamental OS developers bringing serial chips to market can now primitives (e.g. threads, mutexes, timers) run in the reuse much of their original RapidIO parallel kernel itself. All other services, including drivers, design. file systems, protocols, and user applications, run outside of the kernel as separate, memory-protected processes.

Page 2 Making the Switch to RapidIO Making the Switch to RapidIO

In this architecture, process are inherently distributed. send messages over — it can be an Ethernet LAN That’s because they can use just one mechanism, today or a RapidIO backplane tomorrow. synchronous message passing, to communicate with As indicated, software developers can implement any other application or service, either local or distributed message passing using industry-standard remote. For instance, let’s say an application process POSIX calls such as open(), read(), and write(). But needs to send a message to a device driver. To do this, note that developers can also choose to access the it simply issues a POSIX open() call on a symbolic messaging framework directly, using three simple name identifying that driver. The underlying C library calls: MsgSend(), MsgReceive(), and MsgReply(). will then convert the call into a message, and the OS will route the message to its destination (the driver). If the driver is local, the OS microkernel will route Achieving Predictable Response the message directly; if the driver is on another node, in a Distributed System an OS service called the QNX micronetwork will When implementing remote calls in a distributed transparently forward the message to that node. system, the software developer typically has to deal In effect, it doesn’t matter whether the driver is local with complex synchronization issues and ensure that or remote — the application uses the exact same the calls return in a timely fashion. Remote calls can code either way. Consequently, any process can, be especially problematic for a realtime application, given appropriate permissions, transparently access virtually any resource on any other node, as if that MASTER BOARD resource were local. The QNX Neutrino RTOS thus Qnet Flash TCP/IP eliminates much of the complexity in building a Manager Fsys SNMP Manager distributed system, whether that system is based on RapidIO or any other form of interconnect. POSIX TCP/IP HA Ethernet Moreover, existing software can migrate easily to App Manager Manager RapidIO. Because of the clean division of labor provided by microkernel architecture, applications LINE CARD Qnet don’t effectively “care” what protocol or media they Manager VoIP TCP/IP

Microkernel HA Flash Ethernet POSIX ATM Ethernet Manager Fsys Driver App

Message Passing LINE CARD

Microkernel Qnet Manager SS7 ATM POSIX TCP/IP VoIP App Manager Microkernel POSIX POSIX HDLC Figure 1 — In the QNX Neutrino RTOS, the kernel App App contains only a small set of core primitives. All other system services — drivers, file systems, protocol Figure 2 — QNX message passing integrates a network stacks — are provided by separate, memory-protected of individual nodes into a single logical machine. As a processes that can be stopped and started dynamically. result, an application can access resources on any other To achieve this modularity, QNX Neutrino uses message node transparently, without having to invoke remote passing as the fundamental means of IPC for the entire procedure calls. system.

Making the Switch to RapidIO Page 3 Making the Switch to RapidIO

since they typically provide no guarantee that requests inheritance applies whether the message originates going to and from other nodes will be processed in from a process on the local node or on another node. priority order. Predictable response times become This feature is key to ensuring that a system will difficult, if not impossible, to achieve. respond predictably even when its applications and services are distributed across a number of CPUs. The synchronous message passing provided by QNX Neutrino helps eliminate these issues. First, it In fact, QNX synchronous message passing generally automatically coordinates the execution of cooper- makes designs simpler to implement and easier to ating threads and processes. For instance, when maintain: it eliminates the need to keep track of process A sends a message to process B, process A complicated queuing behavior; it makes it easy to will immediately stop running, or become blocked, map a logical design to an actual implementation; until it has received a reply from B. This model and it fosters better encapsulation and information- ensures that the processing performed by B for A is hiding by restricting interactions to well-defined, complete before A can resume executing. It also message-based interfaces between processes. 1 eliminates the need to hand-code and debug synchronization services in either process. Redundant Links for Greater QNX message passing also ensures predictable Throughput and Fault Tolerance response times in a multi-node system by providing To enable fast, predictable response times, the distributed priority inheritance. For instance, if a low- RapidIO architecture offers both determinism priority process sends a message to a driver, asking it (achieved through multiple message-priority levels) to perform some work, the receiving thread in the and low latency. This low latency can be maintained driver will inherit that process’s low priority. This under even heavy network loads, thanks to several ensures that the work done for the low-priority features. For instance, if a link is busy, a packet process doesn’t execute at a higher priority than it doesn’t have to return to the original source device; should, thereby preventing a higher-priority process rather, it simply waits for the link to become from accessing the CPU (a condition known as available. And if the network load is particularly priority inversion). The important thing to remember heavy, RapidIO allows the system designer to is that, with QNX message passing, this priority implement multiple links that, together, provide extremely high bandwidth. These links can also

MsgReply() or SEND provide greater fault-tolerance: if one link becomes MsgError() blocked unavailable, data can be re-routed over one or more

MsgReceive() of the remaining links. MsgSend() MsgSend() Nonetheless, conventional OS architectures don’t RECEIVE READY blocked offer built-in support for multiple, redundant links MsgReceive() between processors — whether those links are all REPLY MsgReply() or based on RapidIO or some combination of RapidIO, MsgError() blocked

This thread fiber, Ethernet, serial, and so on. Consequently, it’s Other thread up to the software developer to implement support

Figure 3 — QNX message passing simplifies the synchronization of processes and threads. For instance, 1 the act of sending a message automatically causes the Besides message passing, QNX Neutrino also provides a full complement of conventional synchronization services, including sending thread to be blocked and the receiving thread mutexes, conditional variables, sleepon locks, and semaphores. to be scheduled for execution.

Page 4 Making the Switch to RapidIO Making the Switch to RapidIO

for each link, a task that must be done “by hand” Of course, a policy appropriate for one application on an application-by-application basis. Each or type of traffic may not be appropriate for another. application may have to specify the primary link, Consequently, Qnet can support multiple policies any alternate links, and how to respond to those simultaneously for different connections across links. To complicate matters, the number of links, processes. the kinds of links, and, the policy for each link (e.g. use link x for failover, link y for load-balancing, Seamless Routing etc.) can change from device to device, or even To further simplify distributed processing, QNX from installation to installation. In fact, a different Neutrino makes full use of the device ID addressing policy may be required for each pair of processes used by RapidIO technology. In RapidIO, no need talking across the links. exists for an address map or other addressing To address this problem, Qnet, the process that implements the QNX micronetwork, provides inherent support for multiple links. Again, appli- cations require no special coding to take advantage of this feature, thanks to the network abstraction enabled by QNX message passing. Importantly, system designers can control how the redundant links are used, by choosing from the following classes of service:

Load-balancing Queue packets on the link that will deliver Load-balancing: Use the combined service of all links to maximize throughput. them the fastest, based on current load and link capacity. When this policy is in effect, Qnet uses the combined service of all links to maximize throughput and allows service to degrade grace- fully if any link becomes unavailable. Once a failed link recovers, it can automatically resume sharing the workload.

Preferred

Send out all packets over the specific link until Preferred: Use an alternate link only if the it becomes unavailable, at which point use the preferred link becomes unavailable. second (or third or fourth) link. Qnet will auto- matically reroute traffic back to the preferred link once it has rejoined the pool of available links. Locked: Use only the specified link Locked Only use the specified link. This policy is Figure 4 — Qnet provides multiple classes of service, useful when only one of the available links is allowing redundant links to increase throughput, fault appropriate to certain traffic. tolerance, or both.

Making the Switch to RapidIO Page 5 Making the Switch to RapidIO

scheme, since addressing is done using the indivi- Mutually Compatible dual device’s ID. QNX Neutrino can use the device As we’ve discussed, QNX Neutrino’s message- ID as its own addressing scheme, making routing in passing microkernel architecture provides a single, a distributed system seamless to the application. unified development model for creating both uniprocessor and distributed applications: The Using Standard Protocols same, simple message-passing paradigm that works For more conventional networking, the QNX on a single CPU works identically across a cluster Neutrino RTOS also supports TCP/IP protocols of CPUs. It is, as a result, much easier to take between processors on the RapidIO interconnect. advantage of RapidIO’s distributed capabilities. Supported stacks include: Still, this only touches the surface of how QNX · NetBSD TCP/IP stack — Supports the latest Neutrino and RapidIO can, together, improve the RFCs and functionality, including UDP, IP and design of networking equipment. For instance, TCP. Also supports forwarding, broadcast and QNX Neutrino’s microkernel architecture also makes multicast, routing sockets, ARP, ICMP, and it possible for a system to recover intelligently from IGMP. To develop applications for this stack, errors in almost any software module — even a faulty programmers use the industry-standard BSD device driver can be dynamically restarted, without a socket API. system reset or user intervention. This software fault- tolerance complements RapidIO’s ability to recover · Enhanced NetBSD stack with IPSec and from hardware errors, again without intervention. As IPv6 — Includes functionality targeted at the a result, systems using both QNX Neutrino and new generation mobile and secure communica- RapidIO can achieve significantly greater uptime with tions. Provides full IPv6 and IPSec support little additional effort on the part of the systems through the KAME extensions, as well as designer. support for VPNs over IPSec tunnels. Also includes optimized forwarding code for There are yet other ways in which QNX Neutrino additional performance. and RapidIO work in concert to simplify the design of high-availability, high-performance network · Tiny TCP/IP stack for memory-constrained elements. For a summary of these complementary systems — Despite its small footprint (<80k), benefits, see the attached “Affinity Table.” this stack provides complete support for IP, TCP, and UDP over Ethernet and PPP interfaces. To develop applications, developers use the BSD socket interface; in fact, developers can switch from the NetBSD stacks to the tiny stack without having to modify or recompile code.

Page 6 Making the Switch to RapidIO Making the Switch to RapidIO

Affinity Table: RapidIO Architecture and the QNX Neutrino RTOS

RapidIO and the QNX Neutrino RTOS are both designed for high-performance communications equipment. As a result, they complement and extend each other in numerous ways. For instance, both provide inherent support for loosely coupled multiprocessing (clusters) and for tightly coupled, symmetric multiprocessing (SMP). Both offer low latency, deterministic response, automatic error recovery, and compatibility with a large pool of existing software. And both employ a modular architecture to keep hardware costs to a minimum. The following table summarizes the various benefits that QNX Neutrino and RapidIO can, in combination, bring to a network element.

Benefit How RapidIO contributes How QNX Neutrino contributes

Scalability · No practical limit. Architecture can · Support for thousands of memory- address over 64,000 attached devices, protected processes per node, with using extended transport addresses. no practical limit on number of nodes. (Note that nodes can share resources transparently; see “Distributed processing,” below.)

Reliability and · Can recover from all single-bit and most · Can recover gracefully from software fault-tolerance multi-bit errors, without software faults, even in low-level drivers and

(See also intervention. Can also notify software of protocol stacks, without system reset. “Redundant links severe errors, allowing software to between nodes”) redirect traffic around failed device. · Microkernel architecture helps eliminate single points of failure in software by · Point-to-point architecture helps running every application, driver, and eliminate single points of failure in protocol stack in a separate memory- hardware (e.g. a pin failure in the protected address space. Faults are backplane). typically isolated to the process in which they occur. · Error recovery techniques include: · Automatic error recovery enabled by the - performing error detection and QNX High Availability Toolkit (HAT), correction on each link in the path which checks for early warning signs that of a data transfer an error is occurring and allows the - correcting corruption by using developer to specify recovery actions separate CRCs for the header and the (e.g. stop and restart faulty driver; data payload connect database to backup file system; and so on).

Making the Switch to RapidIO Page 7 Making the Switch to RapidIO

Benefit How RapidIO contributes How QNX Neutrino contributes

Distributed · RapidIO provides a unified processor · Built-in peer-to-peer networking allows processing bus, allowing multiple devices to act as any node to transparently access the (cluster peers. Any device can communicate resources of any other node. computing) directly with any other device. · Uses RapidIO’s source-based addressing scheme to identify targets of transactions in a distributed system.

· Networking is integrated into the heart of QNX message-passing primitives, making local and network-wide interprocess communication (IPC) one and the same. Thus, an application can access hardware resources on any other node without having to invoke remote procedure calls. Programs are inherently network-distributed.

Redundant links · Supports arbitrary topologies, allowing · Allows applications to communicate between nodes the systems designer to implement transparently across multiple redundant multiple redundant links for fault- links. tolerance and extremely high bandwidth. · Links can be a combination of various media and protocols ¾ RapidIO, fiber, Ethernet, and so on.

· Designers can control how traffic flows across redundant links to boost throughput, fault-tolerance, or both.

Tightly coupled, · Supports SMP through an optional · Supports true “shared everything” SMP symmetric distributed shared-memory extension. through optional SMP microkernel. multiprocessing Microkernel allows any thread to run (SMP) on any available CPU for maximum performance and flexibility.

· Applications, drivers, and OS modules can move unmodified from single- processor to SMP systems. No need to “hardcode” SMP awareness into each software process.

Page 8 Making the Switch to RapidIO Making the Switch to RapidIO

Benefit How RapidIO contributes How QNX Neutrino contributes

Fast, predictable · Provides determinism through multiple · Provides deterministic response times, response times message-priority levels, which ensure both at the application level and within that high-priority transmissions reach all subsystems. As a result, any time- their destination in a fast, predictable critical task, such as a routing table manner, even under heavy network loads. update, can be processed in a fast, predictable manner, even when many · Offers low latency through techniques other processes demand CPU time. such as: · Predictable response achieved through - allowing packets to wait until a techniques such as: link becomes available (no need for packets return to their origin) - priority-driven preemptive scheduling - performing automatic error recovery - nested interrupts in hardware - priority inheritance for all processes - very fast interrupt latencies and · Offers lower latency than bus context switches (less than one technologies like PCI and PCI-X, microsecond per context switch on a while using a smaller silicon footprint. Motorola MPC7450 processor)

· Supports network-distributed priority inheritance so that requests from multiple nodes are processed in priority order. As a result, a system can respond predictably even when its applications and services are distributed across a number of CPUs.

Small footprint, · Modular specification ¾ OEMs can · Modular architecture — The QNX lower hardware include only those functions needed microkernel and its accompanying costs by application, minimizing silicon Process Manager contain only the footprint. most fundamental services: signals, scheduling, process creation, etc. All · Interface can fit in a corner of a other system services (file systems, processor, FPGA, or ASIC. protocol stacks, drivers, etc.) exist as optional, add-on processes. OEMs can · Uses few signal pins (e.g. 8-bit include only those OS modules required parallel version uses only 40 pins per by the application, minimizing memory bi-directional port, less than that required costs. for PCI).

Making the Switch to RapidIO Page 9 Making the Switch to RapidIO

Benefit How RapidIO contributes How QNX Neutrino contributes

Small footprint, · Simplified “thin client” computing — lower hardware network transparency provided by QNX costs (cont’d) message passing allows even the most resource-constrained node to access resources on other nodes: disks, network cards, protocols, databases, etc.

Support for · Developed through a cooperative effort · OS engineered from ground up to support open standards involving Alcatel, Cisco, Lucent, industry-standard POSIX APIs. Allows Motorola, Nortel, and other major OEMs to leverage large Unix/Linux networking companies. developer community.

· Freely available to the industry. · Graphical Integrated Development Environment (IDE) based on Eclipse, an open, extensible framework that allows developers to integrate tools from multiple vendors.

Compatibility · Transparent to application software. · Standard APIs allow developers to easily with existing From software perspective, the RapidIO reuse code developed for other hardware and interconnect “looks” like a traditional POSIX/Unix/Linux OSs. Popular Internet software microprocessor and peripheral bus. software like Apache, Perl, GateD can run natively, without code changes. · Also bridges easily to PCI and PCI-X, allowing designers to use legacy PCI · Supports wide variety of processors and chips. boards used in communications products: PowerPC, MIPS, x86, SH-4, ARM, StrongARM, XScale.

Hardware · The RapidIO logical specification, which · Transport independence ¾ QNX independence defines the protocol and packet formats, message passing hides details of can be transmitted over any interface underlying network media and protocols (serial, parallel, etc.) and any media from applications. As a result, new media (copper, fiber, etc.) and protocols can be introduced without application recoding.

· Processor independence ¾ QNX Neutrino allows many applications and drivers to be source-code identical across processor families.

Page 10 Making the Switch to RapidIO Making the Switch to RapidIO

Benefit How RapidIO contributes How QNX Neutrino contributes

Ease of · Specification is partitioned to support · Since most system services (drivers, extensibility future protocol extensions. file systems, protocols, etc.) are implemented as memory-protected · Designed so that switch fabric devices processes, extending the OS is simply a don’t have to interpret packets, enabling matter of writing new user-space forward compatibility. programs. Programmers can create OS extensions, using standard source-level tools and techniques — no kernel programming or debugging required.

· Drivers, protocols, and applications can also be stopped and started dynamically, allowing operators to extend a live system “on the fly.” No need for system resets or service interruptions.

Low transaction · Source routing — Ensures that only the · Direct copying of messages — overhead to take path between the sender and receiver is Message-passing is a direct operation full advantage of burdened with the transaction. No need between the sender and receiver only. available for the transaction to be regenerated by a No need for intermediate copying if bandwidth host controller. messages are exchanged on local node. Enables message passing to perform on · Since packet headers are as small as a par with conventional IPC. possible, and organized for fast assembly and disassembly, control overhead is · Since most QNX messages are quite minimal. tiny, the amount of data moved around the network can be far less than with · Efficiency increases as data in each network-distributed shared memory. To packet increases. further conserve network bandwidth, QNX RTOS supports combine messages, which package multiple messages into a single message, thereby reducing the number of transactions.

© 2002 QNX Software Systems Ltd. All rights reserved. QNX, Momentics, Neutrino, Photon microGUI, and ‘Build a more reliable world’ are registered trademarks in certain jurisdictions, and Qnet is a trademark, of QNX Software Systems Ltd. All other trademarks and trade names belong to their respective owners.

Making the Switch to RapidIO Page 11