Technical White Paper DATA CENTER www.novell.com

Concurrent Real-Time Extensions

Powered by SUSE® Concurrent Real-Time Extensions Powered by SUSE Linux

Table of Contents: 2 . . . . . Rely on Real-time Capabilities for Linux

2 . . . . . Explore New Real-time Services

2 . . . . . Concurrent Real-Time Extensions Features

5 . . . . . Programmatic Features

6 . . . . . System Deployment and Management

7 . . . . . Building Your Service-oriented Infrastructure (SOI)

9 . . . . . Virtual Computing Platform

9 . . . . . Integrated Services

9 . . . . . Summary

p. 1 Rely on Real-time Capabilities for Linux*

By choosing Linux and Many of today’s businesses—including your real-time data center, you can configure Concurrent RTE for your financial institutions, manufacturers, govern- a modular, rich system that helps you consoli- real-time data center, ment agencies, telecommunications carriers date servers, reduce costs and consistently you can configure a and medical providers—require deterministic, achieve high quality of service. modular, rich system that real-time computing. In other words, their high- helps you consolidate priority transactions, line processes, test data Explore New Real-time Services servers, reduce costs acquisitions and product simulations must With recent real-time enhancements to and consistently achieve execute accurately and predictably every the SUSE Linux kernel, Concurrent RTE high quality of service. time—at sub-second speeds. Because of showcases a highly flexible foundation. This these exacting requirements and the need for foundation is essential for applications that tight security and interoperability, a growing require deterministic behavior. For example, number of businesses are turning to Linux*. financial-trading applications must respond to real-time events, such as new pricing, You’ve witnessed firsthand the remarkable without any delays or interruptions. Likewise, expansion and adoption of the Linux operat- in telecommunications applications, external ing system. It’s gained a significant footprint signals must be processed instantly and in the enterprise data center and currently without dropping a bit. It’s also extremely supports thousands of applications. Now, important for government mission-critical Novell®, Inc. and Concurrent Computer environments such as missile defense, air Corporation have enhanced Linux to deliver traffic control, and emergency response the real-time capabilities you need. Concurrent systems to their respective trans- Real-Time Extensions Powered by SUSE® actions without delay. Linux (hereafter called Concurrent RTE) is an industry-standard, real-time version of Linux Now you can take a closer look at our for Intel- and AMD-based multiprocessors. real-time offerings for Linux. We outline the This platform, an enriched distribution of the specific technical capabilities of Concurrent SUSE Linux kernel from Novell, provides RTE, including enterprise services offerings. you with guaranteed performance in time- You will also see how Concurrent RTE delivers critical environments. a solid foundation for your Service-oriented Infrastructure (SOI). You can use it to build the Key features—including CPU shielding, next generation of data-center computing, kernel preemption, low latency and priority creating an on-demand infrastructure that inheritance—help you achieve sustainable is modular, highly responsive and easy to real-time performance on Linux. You can deploy and control. also rely on other important capabilities that improve quality of service and provide Concurrent Real-Time a solid foundation for a Service-oriented Extensions Features Infrastructure. These features include distrib- uted shared memory, Xen virtualization and Built on the powerful SUSE Linux kernel, real-time cluster file systems, all of which are Concurrent RTE delivers real-time perform- enhanced significantly by the low latency and ance and capabilities. These enhancements, deterministic processing in Concurrent RTE. while easy to deploy, should be configured By choosing Linux and Concurrent RTE for by a professional who fully understands the

p. 2 Concurrent Real-Time Extensions Powered by SUSE Linux www.novell.com

application requirements and can tune Built on the powerful SUSE Linux kernel, your system for optimal performance. Concurrent RTE features centralized man- Concurrent RTE delivers real-time agement and the NightStar™ application performance and capabilities. development tools. The five NightStar tools in Concurrent RTE help you non-intrusively debug, monitor, analyze, tune and schedule A shielded CPU provides a deterministic You can use Concurrent time-critical applications. environment, one with predictable guaranteed RTE to build the next responses to , timers, messages generation of data-center Although extremely valuable for time- and network packets. Shielded CPUs are computing, creating an critical environments, Concurrent RTE is not valuable because they simultaneously provide on-demand infrastructure appropriate for all application deployments. a standard Linux execution environment and that is modular, highly Real-time systems deliver extraordinarily superior quality of service. responsive and easy to predictable application response at the deploy and control. potential cost of overall system throughput. When you run an application on a shielded For example, in order to guarantee predictable CPU, you can be assured that regular system response time for high-priority applications, processing (such as clock/device interrupts normal system processing (e.g., daemons, or system daemons) will not negatively affect interrupts and so forth) is occasionally delayed or application execution. so the system can first complete these high- priority processes. Kernel Preemption You can configure Concurrent RTE so that The following sub-sections outline specific a high-priority process preempts a lower modifications to Linux that allow highly priority process already executing inside predictable behavior: the kernel. Although standard Linux offers a configuration parameter for this behavior, CPU Shielding most Linux distributions do not ship with the Using Concurrent RTE, you can shield parameter enabled. With kernel preemption selected CPUs from unpredictable processing disabled, a lower-priority process executing elements—such as interrupts and system inside the kernel would continue running daemons—that sometimes delay the execution until it exited the kernel. At that time, the of high-priority applications. For example, high-priority process would begin running. suppose a financial trading application was By enabling kernel preemption, you’ll see context-switched during a critical transaction. substantially faster response times for high- Without CPU shielding, this type of switch priority processes. When you have a real- would delay transaction completion until the time process ready for execution (perhaps in system could finish running the application. response to an asynchronous system event) the system will immediately interrupt or con- By marking a CPU as shielded, you modify text-switch any lower-priority processes and the affinity masks of all processes and inter- start running the higher-priority real-time event. rupts in the system. In fact, the shielded CPU is removed from the affinity masks for most Low-latency Enhancements processes and interrupts. Only processes To protect the shared data structures it and interrupts that have explicitly set their uses, the Linux kernel relies on spin locks CPU masks to run on the shielded CPU will and semaphores to secure the code paths be able to do so. leading to those structures. To use spin locks,

p. 3 Concurrent RTE provides kernel tunables To correct this problem, priority inheritance is implemented within the Linux semaphore that allow a system administrator to set the semantics. Priority inheritance involves tem- priority of the kernel daemons that handle porarily raising the priority of a low-priority process that is holding a semaphore when deferred interrupt processing. a higher priority process attempts to gain access to that semaphore. This ensures that the processes executing in a critical section The five NightStar tools the kernel must disable preemption—and have sufficient priority to continue execution in Concurrent RTE help sometimes interrupts—to prevent the dead- until they leave the critical section. you non-intrusively locks that would occur when an interrupt to debug, monitor, analyze, the current code path also attempts to lock SoftIRQ Enhancements tune and schedule time- the spin lock. Disabling preemption nega- critical applications. tively affects latency for high-priority threads Linux supports several mechanisms that of execution. are used by interrupt routines in order to defer processing that would otherwise have Concurrent has developed a standard been done at interrupt level. The processing methodology to identify the worst-case required to handle a device interrupt is split “preemption disabled” times. The low-latency into two parts. The first part executes at inter- enhancements applied to Concurrent RTE rupt level and handles only the most critical modify the algorithms in the identified worst- aspects of interrupt-completion processing. case preemption-disabled scenarios to provide better response times. The second part is deferred to run at program level. Interrupt-level processing is This capability directly impacts process- always performed at a priority greater than dispatch latency times for real-time applica- any program-level processing. The system tions. By refining kernel algorithms to allow can achieve better program scheduling efficient process-dispatch latency, a high- latency by removing non-critical processing priority application can receive near instan- from interrupt level. The second part of an taneous attention from the running system. interrupt routine can be handled by kernel daemons, depending on which deferred Priority Inheritance interrupt technique is used by the device driver. Concurrent RTE provides kernel tun- When used as sleepy-wait mutual-exclusion ables that allow a system administrator to mechanisms, semaphores can introduce the set the priority of the kernel daemons that problem of priority inversion. Priority inversion handle deferred interrupt processing. When occurs when a high-priority process attempts a real-time task executes on a CPU that is to access a semaphore held by a low-priority handling deferred interrupts via these dae- process. If a medium-priority process pre- mons, it is possible to set the priority of the empts the low-priority process while it is deferred interrupt kernel daemon so that a holding the semaphore, the low-priority high-priority user process has a more favor- process will not be able to complete the able priority than the deferred interrupt kernel critical section or unlock the semaphore. daemon. This delivers a more deterministic In this scenario, the low-priority process response time for the real-time processes blocks the high-priority process, creating that execute on this CPU. a “priority inversion” that can last for an indeterminate time (typically until the medium- priority process is completed).

p. 4 Concurrent Real-Time Extensions Powered by SUSE Linux www.novell.com

Programmatic Features To use spin locks efficiently, you can leverage Concurrent RTE provides a mechanism that quickly disables preemp- a mechanism for accurate There are several additional kernel capabilities tion. This mechanism specifies a location CPU-execution time that ship in Concurrent RTE. You will receive within a process’s address space monitored accounting. It also supplemental libraries—and in some cases by the kernel. When this address location is delivers better application- additional commands—to help you access set to a non-zero value, the process will not performance monitoring. these features: be preempted in its execution. High-resolution POSIX Clocks When you combine spin locks with this and Timers preemption-disabling mechanism, you can Concurrent RTE includes support for high- effectively lock a critical section in a user resolution POSIX clocks and timers. You can application with just two memory accesses. use POSIX clocks to perform time stamping or to measure the length of code segments. High-resolution Process Accounting POSIX timers will allow your applications to The standard Linux kernel calculates a schedule events on an individual or periodic process’s CPU-execution times with a very basis. All Concurrent RTE timers are based on coarse-grained mechanism. This means the a high-resolution clock, and you can choose amount of CPU time charged to a particular to specify times in relative or absolute time. process can be very inaccurate. The high- In addition, Concurrent RTE provides high- resolution process-accounting facility in resolution sleep mechanisms that you can Concurrent RTE provides a mechanism for use to put a process to sleep for a very accurate CPU-execution time accounting. It short time. also delivers better application-performance monitoring. Standard Linux CPU accounting Post/Wait (Fast Wakeup Services) services make use of this facility in reporting In Concurrent RTE, you can use the CPU usage. post/wait service to quickly activate and deploy a blocked or suspended process. Support for Inter-system Time Without this service, extensive system-level Synchronization processing can significantly delay activation Using Concurrent RTE, you can synchronize (or “wakeup”). your Linux system time to GPS standard time. This is possible because Concurrent User-level Preemption Control RTE supports the Concurrent Real-time When an application has multiple processes Clock & Interrupt Module (RCIM) PCI card. that run on multiple CPUs and rely on shared The RCIM is a multi-function PCI card that data, you need a way to protect that data offers two optional features for maintaining against simultaneous access by more than highly accurate system time. The first option one process. The most efficient mechanism is a GPS feature that keeps system time in for protecting shared data is a spin lock; synch with GPS standard time. The second however, your application cannot effectively option is a highly accurate crystal that enables use a spin lock if there’s a chance it could you to synch system time across multiple be preempted while holding the lock. (This is nodes in a cluster or across multiple systems. because the preemptive process might also try to lock the spin lock, thereby causing Privileged Operations a deadlock.) The Pluggable Authentication Module (PAM) in Concurrent RTE provides a mechanism you can use to assign privileges to users

p. 5 Because Concurrent RTE and set authentication policies without hav- breakpoints and application patching while is fully compatible with ing to recompile authentication programs. debugging. The NightProbe™ tool helps you standard Linux user- Using this feature, you can give a non-root easily monitor the values of variables in an level APIs, you can run user the ability to run applications that would executing application, the NightTune™ tool applications written for normally require root-level privileges. provides a graphical interface for kernel other Linux distributions statistics and other system-performance without recompiling or Some privileged operations—like the ability to metrics, and the NightSim™ tool helps you modifying them. lock pages in memory—are crucial for real- schedule and monitor time-critical applica- time applications. (This is because unantici- tions that require predictable, cyclic process pated page faults can significantly extend execution. In addition to symmetric multi- the time it takes to execute a code path.) processors, NightSim supports multiple However, the ability to lock pages in memory systems connected via Concurrent’s RCIM. is a privileged operation in standard Linux, NightSim simplifies the creation of distributed an action only permitted to the root user. scheduling and provides a single point of control for individual schedulers distributed You can control access to page locking and across multiple target systems. many other privileged operations through a set of predefined rights. You simply assign System Deployment and these rights to individual users or groups, Management and privileges are granted through roles you It’s easy to deploy Concurrent RTE. You define in a configuration file. A role is a set of simply open the SUSE Linux management valid Linux capabilities. You can use defined interface—known as YaST—and select the roles as building blocks for subsequent real-time kernel. This installs the real-time roles, with the new role inheriting the capa- extensions on your SUSE Linux kernel. And bilities of the previously defined role. As you because Concurrent RTE is fully compatible assign roles to users and groups, you can with standard Linux user-level APIs, you can define the specific capabilities permitted to run applications written for other Linux distri- non-root users. butions without recompiling or modifying them. NightStar Application Development YaST consolidates management, giving you and System Tuning Tools a centralized way to activate clustering and The five NightStar tools in Concurrent RTE real-time features. In addition, management help you non-intrusively debug, monitor, tools from various original equipment manu- analyze and tune time-critical applications. facturers (OEMs) and independent software The non-intrusive nature of these tools is vendors (ISVs) easily interface with YaST. essential because you don’t want the debug- With this extraordinary interoperability and ging or tuning processes to change your complete manageability, you can use the application’s behavior. same operating system (OS) for everything from standard computing grids to high-avail- With the NightTrace tool, you can review a ability clusters requiring sub-second failover time-based graphical display that represents capabilities. Linux already spans an enormous both kernel- and user-level execution that range of processing, including telephone occurred while the application was running. handsets, desktops, servers, mainframes The NightView™ tool is a source-level multi- and much more. It’s not surprising that it’s process and multi-thread debugger that sup- the platform of choice for grids with thou- ports features like low overhead conditional sands of processors.

p. 6 Concurrent Real-Time Extensions Powered by SUSE Linux www.novell.com

Building Your Service-oriented Real-time Distributed Shared Memory With Concurrent RTE, Infrastructure (SOI) (RT-DSM) you can optimize Distributed Shared Novell and Concurrent are laying the ground- Many proprietary applications leverage Memory based on work for a Service-oriented Infrastructure (SOI), Distributed Shared Memory (DSM) capabilities Infiniband. You can even based on a highly deterministic computing to implement highly efficient clustering solu- carve out dedicated environment. The critical features include CPU tions. These DSM implementations typically real-time lanes from shielding, kernel preemption, low latency take advantage of high-speed interconnects an IB connection. and priority inheritance, all described in this such as Infiniband (IB). However, one of the paper. Our SOI vision extends the quality of primary problems with DSM implementations service (QoS) features currently available at is that they cannot keep all nodes coherent a system level to a distributed computing with one another when one node updates environment. A number of other important locations in the shared memory segment. services, including Distributed Shared Memory, For certain types of applications, like financial Xen virtualization and real-time cluster file trading programs, a consistent view of DSM systems are enhanced significantly by low is critical. latency and deterministic processing. With Concurrent RTE, you can optimize To address data growth and server consoli- Distributed Shared Memory based on dation challenges, you need a solution that Infiniband. You can even carve out dedicated solves tomorrow’s data center problems real-time lanes from an IB connection. A dedi- today. One such solution is a Service-oriented cated lane provides guaranteed bandwidth in Infrastructure. It requires a strong modular the connection between nodes. A dedicated foundation and enables you to add or sub- lane also provides better latency guarantees, tract discrete components without affecting so you’ll know exactly when all your DSM- application performance. This flexible “build- attached nodes will be updated. Finally, ing block” approach has tremendous benefits this feature allows multiple nodes to share from a cost perspective. High demand will a coherent view of the shared memory. ensure that these building blocks are primarily, if not exclusively, comprised of commodity In the real world, it’s difficult to move an components. This will allow OEMs and ISVs application from one node to another without to design with an out-of-the-box mentality. data loss. This is because process migration Likewise, you will be free to pick only the requires you to move huge amounts of appli- components you need as you custom cation memory from one node to another. design your own system. By using Distributed Shared Memory, you significantly reduce the amount of memory With its real-time capabilities, Concurrent RTE you have to transfer, since not all memory delivers a sophisticated foundational building must reside on the node where an applica- block for your Service-oriented Infrastructure. tion runs. Examples of other building-block components include Xen virtualization, fabric-based quality If your applications leverage Distributed of service, cluster file systems and Distributed Shared Memory and also require persistent Shared Memory. As these additional com- storage, you would greatly benefit by inte- ponents are refined and optimized, you will grating a cluster file system along with the be able to deploy them on Concurrent RTE. DSM module. Concurrent RTE enhances The following sections explore the advan- both of these capabilities. tages of deploying a real-time kernel with the other building blocks of a Service- oriented Infrastructure.

p. 7 It’s historically been difficult to build efficient unsynchronized times if it runs on different parts of the cluster (due to clusterwide pro- virtual environments because they add cess migration). If you’re running an applica- overhead to the executing system. tion of this kind, it would greatly benefit from system clocks that are fully synchronized across an entire cluster of machines. Your applications get Real-time Xen Hypervisor guaranteed quality Real-time Cluster File Systems It’s historically been difficult to build efficient of service (QoS) from (RT_CFS) virtual environments because they add over- the real-time features head to the executing system, especially when Cluster file systems rely on synchronization in Concurrent RTE. compared against native OS technologies. mechanisms, typically distributed lock man- The ability to not be An exciting open source project, Xen, multi- agement (DLM), to guarantee data integrity interrupted is desirable— plexes resources at the granularity of an across the cluster. You can also use DLM to if not essential—for many entire OS, as opposed to process-level synchronize access to metadata objects for of today’s applications. multiplexing. (This GuestOS is also called the file system and provide POSIX file-locking a container.) All user-level applications run capabilities for user-level data. within a GuestOS and each GuestOS may be dedicated to a single application or it may Software layers on top of each OS facilitate deploy multiple applications. The Xen Hyper- intra-cluster communication and manage visor is responsible for context switching services and resources. Specifically, a cluster GuestOSes. This process involves not only file system provides each node in the cluster saving the state of the GuestOS, but also all with access to the same storage partitions the applications running within the GuestOS. that reside on the Storage Area Network (SAN). This enables applications on each Recent enhancements to the Xen Hypervisor node to access the same files. Using have made it much more sensitive to schedul- Concurrent RTE, your cluster file systems ing priorities, memory demand, and network can take advantage of deterministic behavior and disk traffic. A latency-aware Hypervisor throughout the cluster, resulting in overall enables GuestOSes to run in a virtual machine performance improvements. environment without compromising overall system performance. In addition, the Fabric Virtualization GuestOSes can possess real-time priorities, (QoS-based Clusters) enabling virtual containers (and all applica- Your applications get guaranteed quality tions in those containers) to run without nor- of service (QoS) from the real-time features mal system disruptions (such as interrupts in Concurrent RTE. The ability to not be and daemons). Finally, a real-time GuestOS interrupted is desirable—if not essential— delivers granular control over system level for many of today’s applications. In fact, resources (such as the processor, memory many of today’s computing-intensive tradi- and network) in a virtualized environment. tional applications would greatly benefit from being able to complete tasks without Synchronized Real-time Timers scheduler-driven interruptions. Time-critical applications that run in a clustered environment will likely need time- It is our goal to enrich the real-time capabili- stamping capabilities. A normal clustered ties currently provided within a system and environment provides no guarantees of clock offer these capabilities across the distributed synchronization across multiple cluster nodes. computing network. Real-time functionality In fact, an application could stamp completely must extend across the connecting fabric

p. 8 Concurrent Real-Time Extensions Powered by SUSE Linux www.novell.com

in such a manner that the fabric respects Support Services The following attributes the issuing process’s priority across the are crucial to a Concurrent RTE requires extensive modifica- network. This is the primary attribute of a commodity platform: tions to the Linux kernel. When you deploy QoS-based cluster. mission-critical applications on this real-time Lower cost Virtual Computing Platform infrastructure, you’ll want to keep your kernel Higher performance environment up to date with the latest releases. Modular components The virtual computing platform will be to An integrated services team from Novell Open standards distributed computing what the automobile and Concurrent is ready to provide you with was to the horse and carriage. If your data world-class support for this offering. Novell centers are overwhelmed, and you want to front-line support personnel will assess your consolidate siloed application architectures issue and engage the Concurrent real-time and widely underutilized resources, it’s time support infrastructure as appropriate. to work toward virtual computing. Professional Services Virtual computing is a revolutionary technol- The following professional service offerings ogy that will integrate commodity solutions. are also available for Concurrent RTE: However, today’s businesses must first achieve a successful mix of proprietary and commod- Deployment services. Service engineers ity solutions before they can transition to the will help you deploy your real-time envi- virtualized data center. The recent economic ronment. This offering includes system downturn emphasized that over-provisioned administrative support as well as real-time and underutilized data centers are unafford- application and system tuning. able and drive unnecessary capital spending. Custom image builds. Service engineers By adopting platforms based on commodity can help you create custom real-time components, you will be much more likely to images for client environments deployed maximize your return on investment. across a wide variety of systems. Application migration and real-time We anticipate that the real-time building blocks enablement. These services help you at the core of the operating system will even- migrate proprietary applications to the real- tually extend out to the fabric and create a time environment. This includes application QoS-based distributed computing environ- enablement, so you can take advantage of ment. This on-demand environment will be the many programmatic features offered in highly sensitive to latency requirements in a Concurrent RTE. network-computing infrastructure and serve Training. Our services organization offers as the required disruptive technology that many training courses around real-time enables the virtual computing platform. In deployments. You can opt for a Web-based this setting, agility will become one of your seminar or choose hands-on training ses- most attractive competitive advantages, as sions that focus on deploying real-time you provide customers with dynamic and systems and writing real-time applications. consistent results.

Integrated Services Summary Concurrent RTE delivers a unified environment The Concurrent RTE solution offers a complete for workloads across different industry seg- suite of services delivered by an integrated ments. It provides high availability, real-time Novell and Concurrent team. processing and predictability for your individ- ual systems and all the components that connect them.

p. 9 An integrated services team from Novell and Concurrent is ready to provide you with world-class support for Concurrent RTE.

www.novell.com

With Concurrent RTE as your foundation, you design and development of new software can create a Service-oriented Infrastructure, products for real-time systems. He is also rapidly integrating other technologies and responsible for presenting Concurrent’s Contact your local Novell standards that support agile network-based real-time products to customers, using Solutions Provider, or call computing. Looking ahead, you’ll be able to these forums to better adapt the company’s Novell at: automate and manage your software services product line to customer needs. 1 888 321 4272 U.S./Canada and resources as commodity components 1 801 861 4272 Worldwide and then deploy them freely on a virtual- Moiz Kohari is Principal Technologist at Novell, 1 801 861 8473 Facsimile computing platform. Finally, your ability leading the SUSE Lab real-time and cluster- to quickly respond to system-level events ing initiatives. Kohari is deeply engaged in For more information on Concurrent RTE, visit: and high-priority processes will give you Novell efforts regarding fabric virtualization www.novell.com a network that carries quality of service and quality of service offerings, including guarantees across the fabric. Carrier Grade Linux. Before joining Novell, Novell, Inc. Kohari was Founder and CEO of Mission 404 Wyman Street About the Authors Critical Linux, an organization that pioneered Waltham, MA 02451 USA sub-second failover and clustering technol- Steve Brosky is Chief Scientist of Real-Time ogies on Linux. He has 20 years’ experience Products for Concurrent Computer Corpora- designing, implementing and supporting tion, a leading provider of high-performance, operating systems. Kohari was a member real-time Linux software and solutions for of Digital/Compaq’s Tru64 UNIX custom commercial and government markets. As development and support team, concen- Chief Scientist, Brosky determines future trating on enterprise computing and development strategies, defines new soft- multiprocessor system performance. ware projects and actively participates in the

460-002012-001 | 05/06 | © 2006 Novell, Inc. All rights reserved. Novell, the Novell logo, the N logo and SUSE are registered trademarks of Novell, Inc. in the United States and other countries.

Concurrent Computer Corporation and its design are registered trademarks and NightStar, NightView, NightTrace, NightSim, NightProbe, NightTune and NightSim are trademarks of Concurrent Computer Corporation.

*Linux is a registered trademark of Linus Torvalds. All other third-party trademarks are the property of their respective owners.