Migrating from Wind River* Vxworks* and Legacy Embedded Software to Linux* on Intel® Architecture

White Paper Linux* on Intel® Architecture

Migrating from Wind River* VxWorks* and Legacy Embedded Software to Linux* on Intel® Architecture

Freeing developers to incorporate the technologies they need for high-performance embedded applications

www.intel.com/go/embedded www.intel.com/go/ica Contents

1.0 Introduction...... 1

2.0 Reasons to Migrate...... 1 2.1 Unleashing the Potential of Intel® Architecture...... 2 2.2 Broad Range of Software, Firmware and Hardware...... 4

3.0 Applications Migrating to Linux...... 5 3.1 Intel Architecture Application Focus...... 5

4.0 Comparing Legacy RTOS and Linux System Architectures...... 7 4.1 Memory Architectures...... 7 4.2 Tasks, Processes and Threads...... 8 4.3 Inter-Task Communications ...... 9

5.0 Run-Time Architectures...... 10 5.1 RTOS Run-time Emulation...... 10 5.2 Partitioning and Virtualization...... 11 5.3 Native Application Execution Under Linux ...... 12

6.0 The Migration Process...... 12 6.1 Choosing a Migration Path...... 12 6.2 Choosing a Migration Architecture...... 13 6.3 Mapping Legacy RTOS APIs and System Calls to Linux Equivalents...... 13 6.4 Stepwise Migration...... 14

7.0 Meeting Key Migration Challenges...... 16 7.1 Real-Time Responsiveness...... 16 7.2 Migrating Data Types...... 18 7.3 Time Management...... 19 7.4 Hardware Interfacing and Device Drivers...... 20

8.0 Migration Resources...... 23 8.1 Commercial Solutions...... 23 8.2 Professional Services and Training...... 23 8.3 Other Migration Tools ...... 24

9.0 Conclusion...... 24

References...... 24 1.0 Introduction Linux* has progressed from playing a marginal role in embedded software for intelligent devices to presenting a solid choice for a variety of applications. Nowhere is this trend more pervasive than in OS platform software for embedded applications that need to combine high-performance networking, superior security, standards-based interoperability and fault resilience. In particular, developers of embedded applications, such as communications infrastructure equipment, storage network elements, and security applications are turning to Linux for their current and next-generation designs.

In enterprise IT, these application demands, combined with lower cost of ownership and a vibrant software supplier ecosystem, have contributed to a wide adoption of Linux in the corporate data center. Successful enterprise and device software deployment share the need for rapid time-to-market, and developers are addressing that need through their use of commercial off- the-shelf (COTS) software and hardware solutions.

Just as Linux currently represents the industry standard off-the-shelf OS for enterprise applications and a leading software platform for embedded designs, Intel® Architecture (IA) processors provide the standard CPUs and hardware platforms in data centers and in intelligent embedded devices. Wind River* provides a standard Linux development environment and carrier-grade Linux solution for all phases of Linux-based device development on IA processors.

In enterprise applications, Linux adoption builds upon 30 years of UNIX* software technology and business know-how as well as a decade of server and desktop Linux use. However, with embedded applications, the path from legacy OSes to Linux-based deployment can appear less straightforward.

2.0 Reasons to Migrate Motivations for moving to Linux as an embedded software platform are many and varied. Device OEMs and platform suppliers agree on the following key benefits realized from migration:

• Broad Range of Hardware Support • Standards Compliance • Performance Scalability & Responsiveness

• Lower Total Costs • Enhanced Security • Numerous Development Options

• Greater Reliability • High-Performance Networking

See further discussion on the benefits realized from migrating to Linux in Table 1.

Table 1: Benefits of Migrating to Linux on Intel® Architecture

Broad Range of Off-the-shelf Linux supports perhaps the broadest range of single-board computers and peripherals of any Hardware Support OS for embedded applications. This rich capability set comes from the global community of Open Source developers who build and deploy software for IA-based PCs, blades, servers and embedded devices. From that rich base, Linux is also able to support other Intel processors, such as Intel XScale® technology-based network processors. Developers can also turn to Wind River Professional Services for further customization and optimization of Linux for their hardware configuration and applications.

Similarly, the Intel® Communications Alliance, an ecosystem of hardware and software providers, provide products and services that complement and extend the value of Intel products.

Standards Compliance Linux implements and complies with a range of core standard interfaces, networking protocols, and other standard elements that both ease application migration to Linux and ensure portability and longevity of new applications. In particular, Linux complies with: • POSIX 2000 definition of a UNIX OS • POSIX threading and inter-process communications • Berkeley sockets and IPCs • IETF RFCs for networking including TCP/IP, routing, etc. • X11R6 graphical user interface • SA Forum Management Interfaces (Carrier Grade Linux) Linux also presents its own emerging standards, most importantly the Linux Standards Base (LSB) from the Free Standards Group.

1 Performance, Scalability & Linux-based intelligent devices depend on the performance and scalability of the Linux kernel and OS Responsiveness to meet a range of performance requirements. With its native support for hyper-threading technology and multi-core operation through Symmetric Multi-Processing (SMP), Linux supports greater concurrency, resulting in superior throughput for both compute-bound and I/O intensive loads. Unlike legacy RTOSes, Linux can scale to take advantage of single-core, multi-core and multi-processor architectures, with complete transparency to application code and without expensive special integration. Moreover, both uni-processor and multi-processor (SMP) configurations of Linux offer rapid response to real-time events for soft, and in some cases, hard real-time application deployment. Lower Total Costs The open nature of Linux helps software developers control development, deployment and support costs. The wide choice of vendors, business models, tools and software components for Linux means embedded developers can select the solutions that best meet their technical and budgetary needs. Device developers can also turn to Wind River for cost-effective tools, expert support and services to help streamline their development cycle and bring their products to market quickly. Development Options Choice is the key driver for adoption of Linux in embedded software—choice of distribution, choices for community and commercial support, choices from Open Source and commercial development tools, and choice of suppliers for Open Source and off-the-shelf solutions. Wind River brings over two decades of Device Software experience to bear on providing Linux as a device software platform, with industry-leading Workbench* development environment, powerful hardware run- control tools, and the Platform for Network Equipment, Linux Edition* (Platform NE). Complementing their Linux offerings are Wind River’s expert global support and services organizations. Enhanced Security A decade ago, embedded software targeted primarily stand-alone designs whose hardware content dominated engineering design investment. By contrast, today’s intelligent devices are highly connected to local and global networks, with software content running into the millions of lines of code. These new applications need to secure functionality and content in a more exposed, networked world. Embedded developers increasingly turn to Linux for their enterprise-ready and robust security mechanisms. Enterprise, financial institutions (like Charles Schwab* and Goldman Sachs*), global communications equipment and services providers (like Alcatel*, Avaya*, Nokia* and NTT*), and even the U.S. National Security Agency all deploy Linux in business and mission-critical roles. Likewise, embedded software developers are securing their mission-critical applications with Linux. Greater Reliability Linux has proven itself in enterprise applications that can require five and six “nines” of availability (99.9999 percent uptime). In embedded applications, Linux starts with integrated support for IA memory management, advanced implementation of the POSIX process model, and “hardened” kernel and drivers to confer the same fault resilience and availability upon next-generation intelligent devices. High-Performance Networking TCP/IP networking evolved with and for UNIX-family operating systems. As the most successful implementation of a UNIX-type OS for both enterprise and embedded applications, Linux carries on this tradition of integrated, quality IP networking. In contrast to the “bolt-on” stacks that accompany many embedded OSes, the Linux TCP/IP stack is built into the Linux kernel for efficient marshalling, routing and filtering of both large and small packets.

2.1 Unleashing the Potential of Intel® Architecture In the enterprise, Linux helps developers, systems integrators and IT managers build and deploy scalable and reliable data center applications that realize the full potential of IA processors. In embedded systems, developers using legacy RTOSes (lightweight schedulers and executives) often end up under-utilizing the mainstream capabilities of IA processor-based platforms.

Linux and IA together allow intelligent device designers to take advantage of the following breakthrough technologies originally introduced for the benefit of demanding server and desktop systems:

• Hyper-Threading Technology† (HT Technology) • Multi-core Processing

• Intel® Extended Memory 64 Technology§ (Intel® EM64T) • Intel® Virtualization Technology

2.1.1 Linux and HT Technology

While legacy RTOSes excel in supporting multi-tasked/multi-threaded programs, they often do so with non-standard APIs and without full support for concurrency and dispatch enhancements in modern CPUs. The Linux kernel and OS libraries fully inte- grate HT Technology for transparent acceleration of multi-threaded applications, with standard POSIX multithreading interfaces.

2 Figure 1: The need for multiple RTOS copies vs. Integrated Symmetric Multi Processing with Linux on Multi-Core CPUs

24/3)NSTANCE

24/3 !PPLICATIONS 3YSTEM-EMORY #05 #ACHE

24/3 3YSTEM-EMORY #05 !PPLICATIONS #ACHE

24/3)NSTANCE

%MBEDDED,INUXWITH.ATIVE3-0

#05 !LL$EVICE 3YSTEM !PPLICATIONS -EMORY #ACHE #05

2.1.2 Multi-Core Processing

Intel is an industry leader in the design and deployment of multi-core CPUs. Many developers of next-generation embedded applications are planning for and designing multi-core systems, both to increase instructions-per-watt and to realize scalable compute capacity.

As shown in Figure 1, Legacy RTOSes offer little or no multiprocessing support, running isolated copies of RTOS and applications on each CPU core, raising costs and lowering efficiency.

By contrast, Linux offers native SMP support, scalable and configurable for all types of device software. Linux can dispatch workloads to two, three, four or more processors in a system without need to partition memory, cache or other systems resources. For applications that require it, Linux can also bind threads and resources to a single SMP CPU, or a set of CPU cores.

2.1.3 Support for 32- and 64-bit

Most RTOSes evolved to support 16-bit and early 32-bit processors and memory models. Embedded systems of a decade ago that drove the requirements for RTOSes were optimized for scarce RAM and ROM—1 MB of ROM and RAM was almost a lux- ury; 4 or 8 MB of storage was unheard of. In that stark setting, an embedded kernel needed to fit in 50-70 KB, leaving as much free space as possible for value-added application software and data.

Today’s intelligent devices can take advantage of plentiful and much less expensive RAM and non-volatile memory. Even modest applications can deploy 32 MB, 64 MB, or even larger memory profiles.

Applications built on legacy RTOS platforms, however, must endure the inconvenient legacy of “thinking small”. Many RTOSes have built-in limitations in their kernel constructs, memory allocation and pointer management code optimized for 16- or 32-bit data and addresses. As such, they cannot easily scale to handle the massive address space and heavy 64-bit data throughput of modern applications in networking (routing, deep packet inspection, etc.), storage, database, graphics (high color/pixel) and other demanding areas.

By contrast, Linux on IA allows application developers to take full advantage of Intel 32-bit processors, as well as new Intel® EMT64-enabled CPUs, with more efficient, native support for large data and address types.

3 2.1.4 Virtualization

Virtualization is the hosting of one or more operating systems in an environment provided by another OS or underlying virtual machine monitor. Virtualization can entail running two or more instances of the same OS, one hosted “on top of” the other, or of different OSes running in parallel.

In enterprise applications, virtualization is utilized in virtual hosting of Web servers and applications, for server consolidation in the data center, for load balancing in virtual cluster configurations, and to provide “sand box” environments for security testing and deployment.

In embedded systems, virtualization:

• Allows developers to run legacy RTOS code on a machine controlled by Linux

• Allows specialized RTOSes to execute Linux as their idle task

• Supports deployment of multiple, highly-secured instances of Linux or other OSes to provide additional layers of security

IA processors with Intel® Virtualization Technology offer a family of extensions that simplify virtualization with Linux. This technology lets Linux-based systems provide more complete instruction sets to hosted OSes and applications running in virtualized environments, with greatly improved virtual environment performance.

Virtualization is also discussed later in Section 5.2.

2.2 Broad Range of Software, Firmware and Hardware In the last decade, leading RTOS suppliers, including Wind River, attracted a far-reaching ecosystem of third-party software and hardware suppliers to platforms like VxWorks*. While this set of Independent Software and Hardware Vendors (ISVs and IHVs) helped to support the development and deployment of RTOS-based applications, actual support for specific RTOS implementations often proved inconsistent. Uneven ISV support, especially across processor families and software versions, usually required additional investments in porting, integration and quality assurance, raising development costs and hampering time-to-market.

By building on an open and ubiquitous platform like Linux, ISVs, IHVs and their customers enjoy the benefits of verified interoperability and a wide choice of suppliers and pre-tested COTS implementations as noted in Table 2.

This rich set of commercial components complements the vibrant collection of Open Source community resources, giving device software developers unprecedented choice and quality in their development toolbox and deployment bills of materials.

Intel takes Linux and Open Source seriously. Intel has invested substantial resources in Open Source development, in hardware support and in an ecosystem to support Linux adoption in the enterprise. By building with Linux on IA, developers can benefit from the investments by Intel and its ecosystem of hardware and software providers whose offerings complement and extend the value of Intel products.

Table 2: Broad range of software, firmware and hardware suppliers support Linux

Software Hardware • Device Drivers • CPUs and Chipsets • File Systems • Board and Bus Support (esp. CompactPCI*/AdvancedTCA*) • Networking Protocol Stacks • Peripherals and Peripheral Cards • Middleware • Graphics Chipsets and Cards

• Multimedia Software and CODECs • Connectivity with USB, FireWire*, Ethernet, Bluetooth* and Wi–Fi* • Databases • Acceleration Technologies (e.g., IP Acceleration, Security, Network Processors) • GUI • Hardware Debugging and Test Tools • PIMware and other Handheld Components

• Development and Test Tools

4 3.0 Applications Migrating to Linux According to Venture Development Corporation* (VDC), Linux has climbed from a fairly obscure start in 1999/2000 to occupy the lead slot in market share for 32- and 64-bit designs—see Figure 2. VDC further projects that Linux will top 29 percent of all 32 and 64-bit designs by the end of 2005.

Intel sees Linux deployed as a device software platform for applications ranging from phone switches to firewalls, from machine control to medical monitoring, and from high definition TVs to high-volume cell phones. VDC also confirms this adoption trend with data showing broad adoption across key Device Software design domains, as shown in Figure 3.

3.1 Intel Architecture Application Focus IA is being deployed in all application categories cited by VDC in Figure 3. However, Linux and IA form an especially compelling combination for applications in storage, security and communications market segments.

3.1.1 Storage

Many enterprise and SOHO storage solutions already build and deploy with Linux on IA to implement SAN, NAS and data management applications. Storage application can also benefit from high-performance file systems, networking, peripheral interfaces and buses, file formats and protocols native to Linux and IA.

Figure 2: Market Share of Leading Device Software Platforms Source—Chris Lanfear, Lead Analyst, Venture Development Corporation. "Linux Adoption Factors". Embedded Linux: Coming soon to a Device Near You. Ziff-Davis eSeminar, May 2005.

,INUX

7IND2IVER6X7ORKS

7INDOWS.480%MBEDDED

7INDOWS#%#%.%4

U#/3

1.8.EUTRINO

-ENTOR'RAPHICS.UCLEUS /THER#OMMERCIAL/3

0ROPRIETARY)N HOUSE /3

.OFORMAL/3

Figure 3: Linux* for Device Software—Leading Application Types Source—Chris Lanfear, Lead Analyst, Venture Development Corporation. "Linux Adoption Factors". Embedded Linux: Coming soon to a Device Near You. Ziff-Davis eSeminar, May 2005.

#ONSUMER%LECTRONICS

4ELECOM$ATACOM

)NDUSTRIAL!UTOMATION

-ILLITARY!EROSPACE

/FFICE!UTOMATION

!UTOMOTIVE

-EDICAL "UILDING(OME!UTOMATION

2ETAIL!UTOMATION

/THERS

5 Storage designs can choose from a wide range of file system options for rotating, solid-state and networked media, including:

Disk-based File Systems: EXT2, EXT3

Journaling File Systems: XFS, ReiserFS*, JFS, EXT3

Interoperable File Systems: Windows* NTFS and FAT32, Macintosh* HFS, QNX4*, HPFS

Flash File Systems: JFFS2, CramFS, VFS

RAM Disk: RAMFS, pRAMFS

CD-ROM: ISO9660

Network File Systems: NFS 4.0, SMB (Samba), AppleTalk*

Linux offers high-performance native implementations of IPv4 and IPv6, as well as a range of other communications and distrib- uted applications protocols for storage, including AppleTalk, CORBA*, OpenOBEX, OpenSLP, SOAP and other essential standards-compliant transports and interconnects.

Fortune 1000 companies depend on Linux Networking and file system performance to meet their enterprise requirements for reliability and throughput. Now, that same capability set can power NAS, SAN or other storage applications.

3.1.2 Security

Enterprise data centers rely on security in the Linux kernel, operating system, and key Open Source technologies to combat networked exploits and intrusions, viruses and other malware. In fact, Linux enterprise deployment had its beginnings in “white box” use for software-based firewalls, directory servers and other authentication infrastructure and security applications.

Designs can be more secure when using Linux and IA in taking advantage of mature Linux kernel features and Open Source projects and products (listed below) when implementing firewalls, VPNs, virus scanning, intrusion detection and secure management interfaces:

• IPchains firewall • IP tunneling • IPsec • SSH/SSL

• PAM • LSM • Medusa

3.1.3 Communications

Telecommunications Equipment Manufacturers (TEMs) and Network Equipment Providers (NEPs) are increasingly turning to Linux and IA-based COTS hardware as the means to support the next-generation build-out of converged voice and data. Intel is a leader in this trend with its focus on modular computing and support for Advanced Telecom Computing Architecture*, (AdvancedTCA).

Intel fully supports AdvancedTCA as a key enabling technology for a modular network and believes that it provides the reliability, manageability, and performance required to create carrier-class network elements based on open standards. For NEPS and OEMs, Intel offers a portfolio of boards and platforms designed for AdvancedTCA specifications that provide next-generation performance and high availability for carrier-grade wireless and telecom infrastructure applications.

Intel’s boards and platforms Designed for AdvancedTCA specifications provide next-generation performance and high availability for carrier-grade wireless and telecom infrastructure applications.

Starting five years ago, Intel also worked with key TEMs, NEPs, industry-leading systems suppliers, Linux distribution vendors, and network operators who worked together to create the Carrier Grade Linux initiative. Today, the initiative boasts several dozen vendor participants and has produced three generations of the Carrier Grade Linux Requirements Specification.

Wind River Platform NE, Linux Edition and IA are enabling carrier-class infrastructure and other networking elements to support voice, video and data convergence. Data or telecommunications designs can benefit from Wind River Platform NE on IA, especially for core/edge applications, such as:

6 • Access systems and wireless infrastructure • Media gateways • Voice-Data-Video “Triple Play”

• Network security • High-performance routing and router management

Wind River Platform NE, Linux Edition contains a OSDL Carrier Grade Linux registered Linux distribution.

4.0 Comparing Legacy RTOS and Linux System Architectures Migration from a legacy RTOS, whether a commercial product like VxWorks, pSOS*, Nucleus* or VRTX*, and Open Source executives like eCOS, uCOS, or RTEMS, or an in-house platform, involves more than just recompiling source code. Long before typing “make”, a developer will need to gain an understanding of the architecture of their legacy OS (source OS), of Linux (target OS), and of how and if the constructs of the source OS map onto Linux.

Broadly speaking, a developer will need to understand the following core concepts and how these concepts differ across their source OS and Linux:

• Physical and logical memory • Tasks, processes and threads • Inter-task and inter-process communications

4.1 Memory Architectures

Most commercial and almost all in-house RTOS systems operate in physical memory1—that is, addresses correspond to actual memory locations containing data and/or memory-mapped I/O registers. Running in physical memory is the most natural and common architecture for embedded OSes, and for lightweight applications, to yield high performance.

Physical addressing, however, does have important disadvantages:

• All code and data in RAM are exposed to accidental (or malicious) modification by any and all programs running in the system

• The OS itself and all memory-mapped I/O are also exposed to modification

• Task/thread stacks can overwrite one-another (underflow)

• Overwriting and other untoward access—when it does occur—happens “silently” and may go undetected for arbitrarily long times (unbounded)

Linux—by contrast—employs virtual addressing2 wherein all programs in the system (including the Linux kernel) operate in and with logical addresses. Linux virtual addressing presents a more robust application and system programming model to the programmer. Applications execute in their own protected address spaces, and are, for the most part, invisible to one another, as shown in Table 3. They are also prevented from overwriting their own code through the use of hardware-based Memory Management Units (MMUs) present on most modern 32- and 64-bit processors.

While user programs share a virtual address space with the Linux kernel, they cannot overwrite kernel code or data. Since applications/processes cannot “see” one another (they reside in unique virtual address spaces), they cannot corrupt each other’s data or code3.

Table 3: Comparing RTOS and Linux Available Resource Protection

RTOS Linux* Protection Type Ad hoc, if any (for development only) Process / Page-based Application Data Exposed Protected Application Code Exposed Protected OS Kernel Code (RAM) Exposed Protected OS Kernel Data Exposed Protected Tasks/Thread Stacks Exposed to under-runs Protected with Guard Pages Memory-Mapped I/O Ports Exposed Protected, mapped with mmap()

1E.g., VxWorks 5.0 and earlier and similiar RTOSes. 2Virtual Addressing should not be confused with Virtual Memory. Virtual Addressing is the mapping of pages of physical memory (RAM/ROM etc.) into logical address spaces (from program and even kernel point of view). Virtual Memory is the extension of available (physical) memory by swapping pages or whole processes out to disk or other non-volatile memory. 7 3On 32-bit systems, the Linux kernel occupies the upper 1 gigabyte of virtual address space while the remaining 3 gigabytes are available for use by each process. Note that different processes can (re)use the same virtual addresses. 4.1.1 Access Violations

Another notable difference between legacy RTOSes and Linux is response to faults. When (and if) structural faults (like the memory protection issues in Table 3) are detected, the scope of failure in an RTOS is the whole machine—not because all faults impact the entire system, but because there is not a safe way to recover resources and restart running applications with an RTOS, as shown in Table 4.

With Linux, attempts to write over program code or to access unmapped addresses result in an immediate Access Violation, generating the SEGFAULT signal. If the running process is not equipped to handle a SEGFAULT, then that process terminates, leaving the rest of the system intact.

Trapping access violations limits fault scope to single processes and allows the Linux kernel to recover resources in use by that process. In some cases, it is possible to implement fault handlers that “fix” access violations and allow programs to keep running. For example, with stack underflow, Linux can be configured to allocate new stack pages as needed and continue execution, avoiding the fault altogether.

4.2 Tasks, Processes and Threads Legacy RTOS systems have as their basic scheduling unit the “Task”. A task is a definition of execution state that usually contains the CPU register set, the program counter, and a stack pointer. Tasks are “lightweight” in that a context switch involves the saving and reloading of only these elements.

Traditional UNIX had as its scheduling unit the “Process”, which carried the same state as an RTOS task, plus the addition of MMU page table settings (memory mappings) that define the size, virtual, and physical addresses of the pages of memory “owned” by that process. Linux, as a modern instance of a UNIX/POSIX operating system, treats processes as memory containers only. The basic scheduling for Linux is the “Thread”, a lightweight context very similar to RTOS tasks that executes address- wise inside the container provided by a Linux process, as shown in Table 5.

The similarity of RTOS tasks to Linux threads provides the shortest path for application migration: RTOS tasks residing in a flat physical memory space migrate over to Linux and are mapped to POSIX threads residing in Linux virtual-addresses process containers as depicted in Figure 4.

Table 4: Fault Response Capabilities in Legacy RTOSes vs. LInux

RTOS_ Linux* Program Unit Task Process / Thread Failure Granularity Whole System Single Process Fault Notification None Parent-Child / Signals Safe Task Restart No Yes Grow Resources No Yes Dynamic Memory Recovery No Yes Recover System Resources No Yes

Table 5: Comparing RTOS Tasks with Linux Processes and Threads _ Scheduling Unit Context Addressing RTOS Task Registers, PC, SP Physical Linux* Thread Registers, PC, SP Logical (in process address space) Linux Process Context of 1st Thread Memory Mapping Logical

8 Figure 4: Mapping RTOS Tasks to Linux Threads in One or More Processes

24/3+ERNEL ,INUX+ERNEL 0HYSICAL-EMORY 0HYSICAL-EMORY

24/34!3+ ,INUX4HREAD ,INUX4HREAD ,INUX4HREAD ,INUX4HREAD 24/34!3+ ,INUX4HREAD ,INUX4HREAD ,INUX4HREAD ,INUX4HREAD 24/34!3+ ,INUX4HREAD 0ROCESS 0ROCESS 0ROCESS

4.3 Inter-Task Communications Legacy RTOSes like VxWorks provide a library of mechanisms to facilitate communications among two or more running tasks in a system. These Inter-Task Communication primitives, or ITCs, include familiar constructs like semaphores, mutexes, message queues, mailboxes, etc. Their various roles include synchronization, mutual exclusion, signaling, payload delivery and formal data sharing.

The good news is that Linux provides an extremely rich set of equivalents. The less-good news is that the mapping is not always one-to-one. In most cases, since RTOSes were often written by programmers familiar with UNIX, direct analogues do exist. In other cases, more than one possible mapping exists, with one construct applying to communication among processes (Inter-Process Communications, or IPCs) while another applies to interaction among threads inside a process. And, in a few cases, RTOS constructs are unique and simply do not exist in Linux.

Table 6 illustrates the mapping of the most common communications mechanisms from RTOS inter-task versions to Linux-based inter-process and inter-thread equivalents4.

Which communications constructs a developer chooses will depend on their porting path and the execution architecture to legacy systems running under Linux.

Table 6: Mapping RTOS Inter-Task Communications to Linux* Inter Process and Inter-thread Versions

RTOS Inter-Task Linux* Inter-Process_ Linux Inter-Thread Semaphores (Counting and Binary) SVR4* Semaphores Mutexes pthread Mutexes, Condition Variables, FUTEXes Message Queues and Mailboxes Pipes/FIFOs, SVR4 queues Shared Memory with Formal mechanisms or Shared Memory with shmop() calls or with Threads share name data structures in a through named data structures mmap() process-wide namespace Events and RTOS Signals Signals, RT Signals Timers, Task Delay POSIX timers/alarms, sleep() and nanosleep()

9 4Note that there exist detailed semantics for how POSIX threads handle inter-process mechanisms, e.g., signals. A discussion of these fine points is beyond the scope of this White Paper. Figure 5: Comparing Three Migration Run-Time Architectures for Legacy Applications under Linux

24/3!PPLICATION ,INUX 24/3 .ATIVE,INUX!PPLICATION !PPLICATION !PPLICATION 0ROCESSESAND4HREADS 24/3%MULATION ,IBRARIES ,IBRARIES

,IBRARIES ,INUX 24/3 ,IBRARIES

,INUX+ERNEL ,INUX$RIVERS 24/3$RIVERS ,INUX+ERNEL

!PPLICATION $EVICE$RIVERS 6IRTUALIZATION,AYER $RIVERS $RIVERS

)NTEL!RCHITECTURE0LATFORMS )NTEL!RCHITECTURE0LATFORMS )NTEL!RCHITECTURE0LATFORMS

,/"-Ê,ÕÌiÊ Õ>ÌÊ *>ÀÌi`Ê,ÕÌi «iÌiÊ >ÌÛiÊÕÝÊ*ÀÌ ÛiÀÊÕÝ ÜÌ Ê6ÀÌÕ>â>Ì ÜÌ Ê/ Ài>`ÃÊ>`ÉÀÊ*ÀViÃÃiÃ

5.0 Run-Time Architectures There is no single correct path to running legacy applications under Linux. While the most logical and most robust migration results from the creation of a 100 percent native Linux application, several migration paths exist:

• RTOS Emulation under Linux • System Partitioning and Virtualization • Native Linux Applications

Figure 5 provides a schematic of these three options, and the following sections offer details and popular implementations for each.

5.1 RTOS Run-Time Emulation Run-time emulation involves, at a minimum, servicing the RTOS system calls needed by legacy applications. In some cases, those calls map neatly onto Linux equivalents. In others, new code must be interjected either to “massage” system call parameters, or to emulate missing functionality. Emulation can be lightweight, with libraries implementing small or large subsets of RTOS system calls and/or run-time libraries, or comprehensive, with 100 percent of legacy RTOS calls handled under Linux.

5.1.1 Emulation Libraries

Legacy RTOS systems like VxWorks can present hundreds of system calls and library APIs to developers and their applications. In fact, VxWorks documentation describes over one thousand unique APIs. Most applications, however, use several dozen common APIs that are unique to a legacy RTOS, with the rest calling routines from standard C or C++ libraries supplied with RTOS run-times or with cross compiler tool kits.

Multiple embedded Linux suppliers offer emulation libraries to ease migration of legacy code from VxWorks and other RTOSes to their Linux platform products. These migration kits usually limit themselves to two or three dozen system calls and APIs and are much better suited to prototyping than to deployment.

There is also an independent vendor, MapuSoft*, whose OS Changer* product supports translation from several different legacy RTOSes to Linux, with fairly comprehensive API support. To learn more about OS Changer, visit http://www.mapusoft.com.

5.1.2 Full RTOS Emulation—Wind River VxWorks Emulation Layer*

An interesting means to assure 100 percent compatibility of legacy program is to run a complete copy of the legacy RTOS “un- derneath” legacy application code. If a developer's RTOS supplier (or in-house IP guru) allows them to deploy a copy of the legacy RTOS running as part of a user application under Linux, AND if their team can easily port that RTOS to Linux user space, then Full RTOS Emulation presents a nice shortcut to the migration process.

10 Wind River has plans to make this process simple by offering developers a VxWorks emulation layer for Linux based on Wind River’s VxSim technology. This technology is a complete version of the VxWorks kernel and libraries ported to run inside a single Linux process. It has been optimized to run on Linux for IA, but since it emulates the APIs and semantics of a full VxWorks OS, a developer will be able to use it to migrate code from almost any processor-targeted version of VxWorks. To learn more about this technology, contact Wind River Systems.

The VxWorks emulation layer for Linux technology is best employed initially in a prototype environment. While it is certainly possible to build a deployment system using VxELL, a developer should note three limitations to the VxWorks emulation layer approach. First, carrying around the 100K-200K needed for the Wind kernel and libraries increases the footprint of a developer’s new Linux-based run-time image. Second is that all of the VxWorks emulation layer—the VxWorks kernel and all the tasks in an application—appears to Linux as a single thread running in a single process. As such, while a developer’s application tasks will retain the interaction and semantics of multi-tasking VxWorks applications, they will be scheduled to run “against” each other only when the entire VxWorks emulation layer is running5. Third is that by default, the VxWorks emulation layer is entirely self- contained and does not support hardware access and device driver operation.

5.1.3 Device Drivers and Emulation

Device software has traditionally been defined by its emphasis on I/O and complex interaction with applications-specific hardware. Process-based emulation begs the question: “What about Device Drivers?”

If a developer plans to employ RTOS emulation only for prototyping, then they can defer this question to a later stage in their migration. If they plan to deploy systems built with RTOS emulation, they will need to enable user-space memory mapped I/O and/or develop Linux native device drivers. Both topics are discussed in later sections of this white paper.

5.2 Partitioning and Virtualization Virtualization is the provision of extremely complete hosting environments wherein one operating system runs as an application “over” another, or where a piece of system software (running on “bare metal”) hosts the execution of two or more operating systems or OS instances. While virtualization has traditionally found application in enterprise settings, with Linux, virtualization technology is becoming mainstream in the data center, on the desktop and also in device software.

Intel® Virtualization Technology offers special instruction set optimizations that facilitate virtualization. In particular, the technology allows the “guest” operating system secure access to an extremely complete instruction set, enhancing performance by limit- ing the need for traps and emulated instructions.

5.2.1 Enterprise-type Virtualization

In enterprise computing settings virtualization is used for server consolidation, load-balancing, establishing secure “sandbox” environments, and for legacy code migration. It is in this last capacity that we are of course interested. Enterprise-type virtualization projects and products include the Xen Hypervisor*, User Mode Linux, VMware*, Virtual Iron* and others.

While enterprise-type virtualization can be highly performant, it probably outstrips the needs of all but very high-end and highly- available applications. Most device software developers will probably want to use this kind of virtual machine capability for prototyping, or for hosting cross development from unsupported platforms (e.g., versions of Windows to embedded Linux).

5.2.2 Embedded Virtualization

Enterprise-type virtualization focuses on the creation of execution partitions for each guest OS instance, and the different virtualization technologies strive to enhance performance, scalability, manageability and security. By contrast, embedded virtualization involves system partitioning to host (at a minimum) an RTOS and one or more “application” operating systems, most commonly versions of Linux.

11 5Sometimes called “wheels within wheels”. Also comparable to “green threads” in Java implementations. The hosted RTOS serves the following system functions:

• Migration of legacy RTOS applications (cp. RTOS Emulation)

• Hosting of RTOS-specific middleware (e.g., protocol stacks)

• Real-time responsiveness

While the instance(s) of Linux offer:

• A standards-based COTS software platform

• Enterprise-ready secure interfaces for networking, Web applications, etc.

• Interoperability with data center and desktop systems

• Access to a vast ecosystem of Linux-ready middleware and applications

The topic of Real-time is discussed later in Section 7.1.

5.3 Native Application Execution Under Linux Emulation and virtualization provide excellent bridges for prototyping, development and even for deployment of legacy RTOS applications under Linux. These two technologies, however, have the drawback of requiring the inclusion of additional code and/or underlying infrastructure. Going native on Linux offers significant advantages in terms of reducing complexity, simplifying licensing, ensuring (future) portability and ultimately enhancing performance.

Many projects end up “splitting the difference”. In their first generation migration, they build on emulation and virtualization. After gaining more familiarity with development tools and run-time characteristics for Linux, they re-engineer their legacy applications for native Linux execution.

The process of re-coding for native Linux execution need not prove too daunting. A common approach is to select pieces of legacy applications for native migration with minimal interdependencies and to migrate and recode them in their own process space. Also common (and recommended) is creating any new functionality not in an emulated/virtualized context, but rather in a native one.

6.0 The Migration Process Readers of this white paper are ready to make some tough choices about how best to accomplish their migration efforts.

6.1 Choosing a Migration Path The discussion of Run-time Architectures in Section 5.0 laid out a number of paths for a migration project:

• RTOS emulation with Linux-based run-time libraries

• RTOS emulation via wholesale inclusion of the RTOS, libraries and application in a single Linux process

• Virtualized, partitioned execution with RTOS+Application in one partition and Linux+Application in another

• Complete native Linux port of legacy application to Linux

All but the last path involve the preservation of RTOS-based legacy code intact, with run-time support from emulation libraries or an actual copy of the legacy RTOS running under Linux. If a developer chooses any of these three options, they can skip the rest of this section and proceed to Section 7.0 Meeting Key Migration Challenges. If, however, a developer is contemplating an actual port of their legacy code, then continue reading.

12 6.2 Choosing a Migration Architecture Section 4.2, Tasks, Processes and Threads, illustrated how the closest analog to RTOS tasks can be found in the Linux threads. The strength of this analogy provides guidance in the choice of migration architectures.

6.2.1 Initial Porting Effort—Single Process, Multi-Threaded

The most logical migration architecture, then, is to move legacy task code over to execute as Linux threads in a single Linux process. This approach has the virtues of:

• One-to-one mapping of tasks to threads

• Source and target application both build in a single C or C++ name space

• Continued use, as needed, of informal data sharing through global variables

• Options for migrating hardware interface code to Linux drivers or attempting to use in-line

• Scalability to multi-board legacy systems—each legacy CPU board can map to its own process running under Linux on a more powerful IA CPU

A developer may stop at this stage and deploy, or proceed to further refinement.

6.2.2 Subsequent Refinement—Multi-Process, Multi-Threaded

Extremely complex legacy RTOS applications, with hundreds or thousands of threads and large and complex shared data will benefit from further decomposition. For simplicity of porting and maintenance, it is advisable to break such legacy applications, along subsystem or other functional lines, into multiple Linux processes. Decomposition provides an excellent opportunity to op- timize inter-task communication and synchronization, while translating those connections into Linux inter-process mechanisms that can include pipes, queues, signals, sockets, semaphores, and shared memory.

Good candidates for decomposition include:

• Embedded Web servers • Watchdogs • User Interface code

• CGI programs • Other service daemons • Management interfaces

Also, code that implements functions native to the Linux OS should also be abstracted away, especially file systems, shells, TCP/ IP stacks, and other networking utilities.

6.2.3 Adding New Code

When adding new code to a migrated legacy system, a developer now has the choice of augmenting the original mono- process/multi-threaded core or of making those additions in a new process space. Also, the Linux OS and the greater body of Open Source code is so rich that a developer may not even need to implement new functionality from scratch. Instead, a developer can take advantage of existing middleware, applications and tools built as stand-alone processes, with formal interfaces available to their own value-added application code.

6.3 Mapping Legacy RTOS APIs and System Calls to Linux Equivalents While the benefits of moving to a Linux process-based programming model are enticing, a developer still must address the particulars of moving their application’s use of RTOS APIs over to those in Linux. The good news is that Linux features a rich set of APIs; the bad news is that their code may use calls and features that do not readily translate into Linux.

Application code accesses system calls via libraries that act as “wrappers” for the system calls or implement the entire functions in library code without ever calling the kernel. Examples of the first case are task creation and scheduling calls, like VxWorks taskInit(); examples of the second include library-based threading schemes on older UNIX systems and queues in some

13 OSes. In practice, a developer’s legacy RTOS-based application probably makes no distinction between system calls and library functions, and may leverage dozens or hundreds of available APIs under an RTOS or Linux.

RTOSes accrued thousands of APIs over the years and it is not practical to address all those interfaces. Rather, a pragmatic approach is to translate and emulate a core set of the four or five dozen most common calls, and to leave the rest forad hoc translation and implementation.

6.3.1 Example Mappings

Table 7 provides a few examples of mapping core RTOS (from VxWorks) functions onto their Linux equivalents. Such calls are semantically similar enough to serve as illustrations of compatibility, if not interoperability. Key differences can include:

• Type and Number of Parameters: e.g., parameter lists passed to tasks and threads at creation time—VxWorks allows pass- ing exactly ten parameters of type int, while the Linux pthreads library uses an argument pointer (void *arg) to an arbitrarily long list of parameters of arbitrary type.

• Synchronous/Asychronous Behaviors: e.g., VxWorks semTake() specifies as timeout value in system clock ticks, while Linux sem_wait() waits forever, or until a signal is received.

• Granularity of Calls: some functions in VxWorks need two calls in Linux, and vice-versa.

Table 7: Mapping core RTOS functions from VxWorks to Linux

Call Type VxWorks* Call _ Linux* Equivalent Task Creation taskSpawn() pthread_create() or fork() Message queue Instantiation msgQCreate() mq_open() Aquire semaphore semTake() semget() and sem_wait() Wait taskDelay() sleep() and nanosleep()

6.3.2 API Reconciliation

Let’s examine how to bridge the gap between legacy RTOS calls and candidate equivalents in Linux. The goal here is assumed to be to avoid recoding legacy code and instead using #include files and macros to “massage” the source and target APIs.

• One-to-One Mapping with POSIX, SVR4 and BSD Libraries Many RTOSes, including VxWorks, offer POSIX and other UNIX API sets, as either primary or secondary means of performing common operations. If legacy code employs those APIs (instead of equivalent “native” RTOS calls) then a developer's job is done!

• Parameter Massage In many cases, calls from the legacy RTOS are convertible to Linux APIs through judicious use of #define macros and casting. This simple solution applies best to changing API names, parameter reordering and modifying parameter and return types.

• Wrappers To make multiple Linux calls from a single RTOS API, to inject new parameters into a call, or anywhere that outstrips the capabilities of a macro, a developer may need to write slightly more extensive wrapper code. Whenever possible, take advantage of the inline extension in gcc (even for C language code) and place wrapper functions inside the #include files.

• Per-function Emulation Sometime the gap is just too far to bridge. Divergent semantics, synchronous/asynchronous behaviors, or the complete ab- sence of analogous APIs ultimately requires the insertion of new code. If only a few APIs “get away”, then a developer's team is probably up to the task of writing emulation functions. For crossing larger compatibility “chasms” (e.g., entire libraries or RTOS API sets), a developer can turn to kits from embedded Linux vendors and a handful of independent software vendors.

6.4 Stepwise Migration Unfortunately, there is no single road “through the woods”. There are, however, some key phases and logical steps common to most migration efforts.

14 6.4.1 Use the Current Build Environment

If a developer’s legacy RTOS already builds on GNU tools (gcc compiler, gas assembler, etc.), then they can skip to the next step. If not, a developer can benefit from rebuilding their legacy code using a version of gcc that is native to their environment, or a cross version hosted there. By beginning with the GNU tool chain, even building on their legacy build scripts, they can shake out incompatibilities between their legacy code base in a “safe” environment.

A developer needs to go as far as possible in their legacy build environment, if possible cross compiling for their legacy hardware or natively compiling on their IA-based Windows or Linux host.

6.4.2 Using Cross or Native Build System for Linux

Now, a developer can actually pack up and move to a Linux-based run-time. Since this white paper is discussing Linux on IA, a developer can proceed either by using a workstation Linux implementation (e.g., Red Hat*, SuSE*, Mandriva*) and by building for native/local execution, or for cross deployment to an IA-based single-board computer (a standard motherboard, ATCA, CPCI, PC/104, etc.).

• Isolate Legacy-specific Header Files, Libraries, etc. The developer starts by removing ALL RTOS-specific headers and libraries from their build, instead using implicit gcc link paths and explicit ones to libraries as needed. If the build scripts include explicit path references, comment them out. If the build depends upon detailed linker layout files, set them aside (for now). The developer’s goal should be to make their build as “generic” as possible.

• Modify Make/Build Files and Try Building If a developer’s legacy build engine is not overly complex, they should be able to modify their main make file(s) or other build scripts to run on and for Linux.

When a developer tries to build on Linux and for Linux, they will likely find themselves deluged with error and warning mesages, but that is to be expected.

• Analyze Build Output The developer needs to capture the output/error listing from their Linux-based make to a file. They should see a mix of:

1) Undefined symbol references from legacy RTOS library and system calls

2) Include files not found

3) Undefined symbol references from missing valid Linux include files

4) Type mismatches from legacy symbols/names with analogs in Linux and parameter list mismatches

The developer should start with APIs they know. The Linux man command is a developer’s best friend here—use apropos or man –k to search for likely Linux versions of legacy functions. Sometimes APIs will exist for Linux but not be installed on a developer’s local machine or in their suppliers cross development kit. Here, Google* is an ally.

6.4.3 Incremental Porting

Some portion of early build pass error messages will result from mismatched but comparable constructs. A developer can choose to recode their legacy programs to conform to Linux, or in many cases they can construct macros to massage types and parameters to suit their new system.

The remainder of the errors and warnings will arise from “missing” content—that is, RTOS APIs that have no analog in Linux, e.g., mailboxes, preemption disabling in user space, “exotic” scheduling paradigms, explicit manipulation of RTOS task control blocks, etc. At this juncture, a developer has a few choices:

• Create local emulation libraries that implement the missing function(s) or extend commercial libraries

• Acquire off-the-shelf compatibility libraries (e.g., OS Changer) or use Open Source project code (e.g., Xenomai and legacy2Linux)

15 • If possible, implement the functionality through gcc inline extensions and macros

• Remove the code from the legacy base entirely. For I/O, a developer will want to migrate it, at least conceptually, to a Linux device driver

6.4.4 Tune Emulated Application

When a developer does get their legacy code to build completely, and then to execute, they need to build on their legacy unit and systems tests to test for correctness and performance. Some parts of a developer’s application will migrate sub-optimally; others will exhibit marked improvement in their new environment.

7.0 Meeting Key Migration Challenges There remain a few important technical areas with impact upon migration that merit discussion. These are Real-time, Data Types, Time Management, and Device Drivers:

7.1 Real-Time Responsiveness Venture Development Corporation reports that the perception of Linux real-time responsiveness (or lack thereof) remains today the single largest barrier to Linux adoption in embedded software. In part, this perception reflects reality—Linux is not an RTOS. It is a high-performance GPOS (General Purpose OS) architected for throughput and not for minimal latency and maximum de- terminism. However, from investments to improve overall performance, and from specific work to reduce latency, Linux today satisfies 87 percent of embedded application requirements for real-time.6

How can a developer determine if Linux on IA will meet their application real-time response needs? First, the developer and their Linux supplier (if any) must agree on common terms to describe real-time responsiveness, and second they will need audit the response needs of their application design using the same terms and concepts.

Real-time performance metrics, despite their basis in hard science, are often “redefined” in the marketplace. Moreover, embedded developers who “consume” this information are accustomed to its implications for lightweight RTOS systems, but not necessarily for Linux. To align the discussion, which includes both legacy RTOS systems and embedded Linux, the following discussion (re)examines a few key concepts: Interrupt Latency, Preemption Latency7, and Context Switch.

7.1.1 Interrupt Latency

Interrupt Latency (see Figure 6) is the time from the assertion of an interrupt signal at the hardware level through entry into an Interrupt Service Routine (ISR) that handles that interrupt. While Interrupt Latency is often cited as a key metric for embedded OSes, it is more often a measure of how quickly software can “get out of the way” than an indication of OS performance.

On “bare metal” benchmarks, with a lightweight kernel (or no OS) and no load, Interrupt Latency becomes merely a measurement of vectoring overhead. More realistic tests of Interrupt Latency are subject to a variety of delaying factors, including hardware propagation, software and hardware disabling of interrupt (usually by OS kernels and device drivers), and time needed to resolve shared interrupts before dispatching the ISR for execution.

RTOS and Linux kernels themselves disable interrupts in (hopefully) short critical regions when performing interrupt-related and/ or non-reentrant operations. More frequently, interrupts are disabled for longer periods in device drivers, which is why loading is key to arriving at realistic latency metrics. More often than not, measurements of Interrupt Latency are reflections of device driver quality and not of OS performance.

Independently published benchmarks8 for Linux Interrupt Latency on Intel Architecture9 reveals average latencies in the 10–15 microsecond range, with driver-induced worst cases extending as long as 150–200 microseconds; kernel-based worst-case Interrupt Latencies run between 40 and 80 microseconds.

6Real-time Requirements Satisfied by Linux for Respondents Using Linux. Chris Lanfear, Lead Analyst. Venture Development Corporation (VDC), 2005. 7Also known as Task Response Latency and sometimes Scheduling Latency. 16 8White, Brandon. "Linux 2.6: A Breakthrough for Embedded Systems." LinuxDevices.com. September, 2003. 91 GHz Intel® Pentium® III processor Figure 6: Components of Interrupt and Preemption Latency and the relationship among them

(7$ELAY )NTERRUPTS 6ECTOR $ISPATCH )32$RIVER $ISABLED

)NTERRUPT,ATENCY

)NTERRUPT +ERNEL 3CHEDULER #ONTEXT 2E 3CHEDULED ,ATENCY 3WITCH 4ASKOR4HREAD

0REEMPTIONOR4ASK 2ESPONSE,ATENCY

7.1.2 Preemption Latency

Preemption Latency (see Figure 6) comprises a more complex sequence. Consider a high-priority application thread that is suspended pending synchronous I/O. Preemption Latency, then, is the time from the interrupt occurring for that pending I/O through the interrupt processing sequence, the preemption of any executing threads of lower priority, and the scheduling and dispatch of the suspended thread whose I/O just occurred. Based on the presumption that “work” is performed by application threads, Preemption Latency is a more useful benchmark of OS agility and how quickly it “gets to work”.

Preemption Latency is subject to system loading, so testing quiescent systems will reveal little or nothing except overhead figures. Remember, Preemption Latency is subject to variations in Interrupt Latency as well as the length of regions in the kernel and drivers where preemption is disabled (locks) and to time spent in scheduling and dispatching I/O-pended application threads.

Independently published benchmarks for Linux Preemption Latency on IA exhibit a wide range of variation (based on load). Worst cases, even for a 2.6 kernel, can surpass 1–2 milliseconds, while average Preemption Latencies run in the 100–200 microsecond range.10 As with Interrupt Latency, device drivers can contribute substantially to worst case numbers.

7.1.3 Context and Switch

RTOSes usually run entirely in privileged mode (supervisor or system mode), with the result that the measurements of Context Switch are reduced to a measurement of scheduling overhead. In fact, most RTOS Context Switch benchmarks involve two tasks of equal priority that yield to one another in an otherwise quiescent system.

By contrast, Linux exhibits several distinct context types and a variety of Context Switches among them:

• Between threads within a process address space

• Between threads in different processes

• Between user mode and system mode operation

Reproducing lightweight Context Switch benchmarks applied to RTOS code is best accomplished by using comparable threading constructs on Linux. However, even on an RTOS, Context Switch alone provides a very artificial metricout of context—that is, Context Switch is usually more interesting as a component of measures of Preemption Latency.

17 10Ibid. Yield time, thread-to-thread on Linux, as with most RTOSes, is computationally trivial and benchmarks run to 1 microsecond or less. More important is Context Switch based on contention for a shared object. If Linux needs to make a system call to test the state of semaphore or mutex, the overhead can send lightweight context switch times over 5 microseconds. Not an egregious number by itself, but legacy applications can switch context thousands of times per second, e.g., legacy network stacks that deploy one thread per layer and so switch context at least twice per packet processed. Microseconds quickly add up. Cumulative synchronization overhead is why the Linux community is investing in faster synchronization mechanisms like the FUTEX.

7.1.4 Options for low-latency response in Linux

There is no single “right way” to address real-time requirements with Linux. Options include:

• Use the preemptible Linux kernel capabilities native to the 2.6 Linux kernel

• Other low-latency patches to the kernel from Ingo Molnar, MontaVista* and others

• Use of sub-kernels like RTLinux* and RTAI*

• Application partitioning to let a legacy RTOS itself field real-time sensitive events

• Tuning the kernel and device drivers to support application-specific real-time needs

• Performing time-sensitive operations in the “top half” of drivers instead on in application code (appropriate for small numbers of interrupting devices)

The 2.6 Linux kernel offers developers interfaces that permit logging worst-case Interrupt and Preemption Latencies. In particular, /proc/latency can be configured to log worst-case interrupt-off and preemption-off times to a static array, which a developer can mine for worst-case statistics.

Embedded Linux vendors also offer tools that automatically collect data from /proc/latency and display it for a developer’s team. Other suppliers offer open and proprietary tools that can measure latencies using different software and hardware techniques.

7.1.5 A Developer's Real-time Decision

If pervasive, “hard” real-time responsiveness is a primary design criterion, a developer probably wouldn’t even be considering Linux for their next design. However, developers who are targeting applications with constrained real-time requirements (or none at all) and are interested in Linux because Linux is open, with a vibrant ecosystem around it, then these developers have the following choices:

• Use Linux “as-is”

• Tune Linux and their application to meet their real-time needs

• Source real-time enhancing technology and services from a Linux ecosystem supplier, e.g., Linux suppliers within the Intel Communications Alliance

• Source real-time enhancing technology and services from Wind River or another Linux ecosystem supplier, e.g., an Intel Communications Alliance member

7.2 Migrating Data Types A “silent” problem that can occur during migration is the mismatching of data types and orientation between a developer’s legacy hardware, OS, tools and applications, and those presented by Linux running on IA, the tools used to build it and their application.

18 7.2.1 Data Sizes

To ensure a smooth transition, it is advisable that a developer review (or even inventory) the data types used by their application. A good place to start is with the core data types (starting with int and float) defined in include files associated with their legacy OS, by the compiler(s) used to build their application, and by the application itself.

A developer can find the Linux equivalents on most Linux hosts in the following directories

/usr/include /usr/include/linux

and in similar locations in sub-trees in cross development environments.

If a developer encounters mismatches, they will need to consider the following issues before proceeding:

• Does the code have explicit (and implicit) dependencies on data type sizes?

• Will struct layout affect code execution and semantics? (important for unions and device registers)

• Will the target data type handle all possible results of calculations? (smaller base type)

• Will correct sign extension occur for all calculations? (larger base type and/or change to unsigned)

• Will bit-wise operations be preserved for bit fields?

• How will data type changes affect device interface code?

If a developer wrote their application for portability (or ported it across legacy systems previously), most of the issues listed above will already be addressed by their coding standards and/or by use of ANSI C types. If not, the developer will have to carefully audit their code or device intermediate include files that “massage” data types and type names accordingly.

7.2.2 Little and Big Endian Issues

IA is little-endian which means the words are ordered in memory starting with the least significant byte and proceeding in order to the most significant byte. This ordering is true for all integer data types. Many legacy hardware systems11 implement the opposite, big-endian ordering. Others come in both little and big-endian implementations, and some may even combine both order- ings in a single system.

Most compiled code processes “endianness” transparently. Key exceptions include:

• Byte ordering in hardware device registers

• Byte-level layout in struct and union constructs

• Networking packet marshalling (which is ALWAYS big-endian)

Exposure to byte-order specifics is motivation to migrate device access code into formal Linux device drivers (and not perform it in-line). Data type aliasing with struct and union is seldom portable across compilers, let alone hardware and OS architectures, so migration is a good time to implement this kind of code anyway. Also, networking packet marshalling is usually best left up to the OS or to well-matched middleware. If a developer's application value-added involves access to raw packets, Linux does provide a series of APIs to flip word ordering as needed.

7.3 Time Management Legacy RTOS-based applications frequently build on both software and hardware timers for time management functions and for task execution delay (e.g., VxWorks taskDelay()). Fortunately, Linux supports a variety of clocks, timers and alarms, and APIs for delays of different lengths and resolutions.

19 11E.g., 68000*, PowerCore*, and PowerPC* architectures. 7.3.1 Timers

RTOS timer constructs map well onto Linux interval timers and alarms (setitimer() and alarm() ) that generate signals on timeout.

Important differences between Linux timers and most RTOS timer implementations include:

• RTOS timer APIs tend to quantify time in terms of RTOS system clock ticks while Linux uses “real” time (seconds, microseconds, or nanoseconds).

• Time-out behavior with RTOS timers usually involves interrupts, whereas Linux uses signals to notify applications of time completion.

• Whereas most RTOSes can manage only a handful of timers, the Linux kernel can create and maintain very large numbers of software timers (even thousands of them) with minimal overhead.

RTOS task delay calls map fairly neatly into a family of Linux/POSIX sleep APIs: sleep(), usleep() and nanosleep() are program calls that wait an appropriate number of seconds, microseconds or nanoseconds, respectively. There is also a shell version of usleep useful for scripting.

While Linux is quite adept at time management in general, sleep calls (and timers too) depend on the resolution of the system clock and so seldom implement down to the putative resolution of their parameters (e.g., nanoseconds).

While Linux offers the above timer and delay interfaces, the clock resolution of those mechanisms is rather coarse compared to legacy RTOSes. Linux uses its preemption clock as a time base for all such calls, and the available resolution of that clock is 1ms (10ms with 2.4 kernels). So, while APIs exist with parameters specified in microseconds and even nanoseconds, delay counts will always resolve to this grosser granularity.

Several paths exist for achieving higher clock resolution with Linux. If unused timers are available in a developer’s hardware design, they can implement their own interrupt-based delay mechanism (as they would with an RTOS). If a developer is using a sub-kernel or an RTOS running in a virtualized partition, that subsystem may also offer finer time bases.

A more interesting option is enable Linux with native higher resolution timers. There is an Open Source project working towards that end—the High Resolution Timer Project. Since its original purpose is to satisfy Carrier Grade Linux requirements on COTS hardware, it already targets IA CPUs and available timers in standard IA chipsets.

Learn more at http://sourceforge.net/projects/high-res-timers.

7.3.2 Watchdog Timers

To enhance system reliability, many RTOS-based designs use watchdogs—timers that count down asynchronously; when watchdogs time-out, the system resets or enters a low-level fault resolution mode. Watchdog time-out is supposed to occur only with out-of-control software, so programmers pepper their code with watchdog timer resets. That way an expiring watchdog in- dicates a critical fault necessitating a reboot.

Linux increases system reliability through other means, but legacy applications may still need watchdog functionality to avoid re- architecting. The easiest way to emulate a watchdog is to use setitimer() to create and reset a process-based timer similar to an RTOS watchdog. When a process-based interval timer times out, a developer’s program will have to handle the SIGALRM (or related signals) as part of their fault resolution sequence.

7.4 Hardware Interfacing and Device Drivers As described in the Architectures Sections 4 and 5 of this white paper, it was noted that all RTOS application and system code has access to the entire machine address space, memory-mapped devices, and I/O instructions. This “flat” world view also means that it can be quite difficult to distinguish RTOS application code from driver code. Especially since even if an RTOS features an I/O ”sub-system”, developers still choose to perform I/O in-line

20 #define PORT 0xFC000100

unsigned char getcchar(void) { return (*((unsigned char *) PORT)); /* read char from port */ }

void putchar(unsigned char c) { *((unsigned char *) PORT) = c; /* write char to port */ }

The pervasive use of in-line memory-mapped I/O usage tempts developers new to Linux to port code “as-is” to user space, by converting the #define of peripheral addresses into calls to mmap() (see below). This approach serves for prototyping, but is not suitable for commercial deployment; it does not support interrupt handling, offers limited real-time responsiveness, and is not secure.

Remember that Linux interrupt service is handled exclusively by the kernel. Even if a developer uses mmap()to perform read and write operations in-line in user space, they will have to put their Linux ISR into kernel space.

7.4.1 Typical RTOS-based I/O with Queues

Figure 7 illustrates a typical queue-based I/O scheme (input only). Processing proceeds as follows:

• An interrupt triggers ISR execution.

• The ISR either completes the input operation locally or lets the RTOS schedule deferred handling. Deferred processing is performed by an RTOS task.

• Ready data is placed into a queue.

• One or more tasks then read messages from the queue.

For output, instead of using write() or a similar call, one or more application tasks place ready data into a queue. An I/O routine or ISR drains the queue in response to a “ready-to-send” interrupt, a system timer, or another application task that waits pending on queue contents and then performs I/O directly (polled, DMA, etc.).

Figure 7: Migrating legacy queue-based I/O to a Linux device driver paradigm

4ASK 4ASK 4ASK 4ASK 4ASK 5SER -ODE

$E?Q 0ROCESS4HREAD 0ROCESS4HREAD 0ROCESS4HREAD 0ROCESS4HREAD 0ROCESS4HREAD

1UEUE WRITEDATABACK TOUSERMEMORY READSYSCALL %N?Q )32 +ERNEL $RIVER +ERNEL

3YSTEM (ARDWARE -ODE (ARDWARE

,EGACY24/3 %MBEDDED,INUX

21 7.4.2 Mapping RTOS I/O into Linux User Space

The queue-based producer/consumer I/O model in Figure 7 is just one example of ad hoc approaches employed in legacy embedded software. Using this straightforward example, there are several possible (re)implementations under embedded Linux, such as:

• Wholesale Port to User Space • Re-architecting to Use Linux Drivers

• Preserving an RTOS Queue-based I/O Architecture • Holistic Approach—Re-architecting

• API-based Approach

7.4.2.1 Wholesale Port to User Space

Developers who are reticent to learn the particulars of Linux driver design, or who are in a great hurry, can try to port such a queue-based design, intact, into a Linux user-space program. In this driver mapping scheme, memory-mapped physical I/O takes place in user context via a pointer supplied by mmap().

#include

#define REGISTER_WIDTH 0x2 /* peripheral register size in bytes */

#define REGISTER_OFFSET 0xFF200010 /* physical address of peripheral */

int fd; /* mmap() requires a file handle */

void *pptr; /* pointer for memory-mapping */

fd=open(“/dev/mem”,O_RDWR); /* open phys mem (must be root) */

pptr = mmap((void *)0x0, REGISTER_WIDTH, PROT_READ+PROT_WRITE, MAP_SHARED, fd, REGISTER_OFFSET); /* call mmap() to get logical addr */

A process-based user thread performs the same processing as the RTOS-based ISR or deferred task would, and then can use IPCs like msgsnd() to queue a message for receipt by another local thread or by another process via msgrcv().

While such a “quick and dirty” approach is good for prototyping, it presents significant challenges for building deployable code. Foremost is the need to field interrupts in user space. Linux offers a few ways to perform user-space interrupt processing, but those mechanisms are very slow (millisecond latencies instead of tens of microseconds for a kernel-based ISR). Furthermore, user-context scheduling, even with the preemptible Linux kernel and real-time policies in place, cannot promise timely execution of user-space I/O threads.

7.4.2.2 Re-architecting to Use Linux Drivers

A developer should write at least a basic Linux driver for interrupt processing. A simple character or block driver can field interrupt data directly in the “top half” or defer processing to a tasklet, kernel thread or to a work-queue. One or more threads/processes can open the device and perform synchronous reads, just as the RTOS application made synchronous queue receive calls. This approach will require at least recoding consumer thread I/O to use device reads instead of queue receive operations.

22 7.4.2.3 Preserving an RTOS Queue-based I/O Architecture

To reduce the impact of porting to embedded Linux, a developer could also leave a queue-based scheme in place and add an additional thread or daemon process that waits for I/O on the newly-minted device. When data is ready, that thread/daemon wakes up and en-queues the received data for use by the consuming application threads or processes.

7.4.2.4 Holistic Approach—Re-architecting

For projects not severely time-constrained, with goals to produce portable code for future revisions, a developer will want to spend time analyzing the current structure of their RTOS application and how it maps onto Linux.

For drivers, a developer should try to convert informal in-line I/O code into Linux drivers. If a legacy application is already well- partitioned, using RTOS I/O APIs or an adaptation layer, it will be much easier. If, however, I/O code is scattered through their legacy sources, their work will be more challenging.

7.4.2.5 API-based Approach

Developers pressed to move from a legacy RTOS, or just trying to gather up a prototype, are likely to try to map or convert as many legacy APIs to Linux equivalents in place. The more similar objects across OSes port more transparently (comparable APIs, IPCs, system data types, etc.). Others can be taken care of with #define redefinition and macros. The remaining constructs will need recoding, preferably as part of an abstraction layer.

In general, user-space code is isolated from the Linux kernel and can only “see” explicitly exported symbols as they appear in /proc/ksyms. Moreover, visible system calls to the kernel are not invoked directly, but via calls to user library code. This segregation is intentional, enhancing stability and security in Linux.

When a developer writes a driver, the opposite is true. Statically-linked drivers are privy to the ENTIRE kernel name-space (not just exports), but have zero visibility into user-space process-based symbols and entry points. And, when a developer encapsu- lates driver code in run-time loadable modules, their program can only utilize interfaces explicitly exported in the kernel via the EXPORT_SYMBOL macro.

8.0 Migration Resources This white paper has provided an overview of key migration challenges, principally by comparing the architectures and conven- tions of legacy RTOS-based systems and of embedded Linux. It also provided many concrete suggestions and engineering techniques to aid in migration. However a developer can also build on a variety of prepackaged commercial solutions including Linux platform products, tools, and services.

8.1 Commercial Solutions The major embedded Linux platform providers offer emulation and compatibility libraries for legacy RTOSes like VxWorks, pSOS, and Nucleus. Some commercial platforms include this kind of functionality, while others offer it as an add-on. There also exist a few third-parties who offer RTOS “translation” libraries and utilities.

Wind River, for its part plans to provide the VxWorks emulation layer technology. This is essentially a full port of VxWorks for IA to Linux user space, which will support 100 percent emulation for legacy RTOS applications.

8.2 Professional Services and Training In the last five years, migration from legacy systems has been central to the business of all embedded Linux platform providers. As such, most have Professional Services and Training teams already expert in migration to Linux from VxWorks and other legacy RTOSes.

23 Wind River today offers its customers a choice between VxWorks and Linux, and also supports combinations of the two OSes in multi- tier applications. Their Professional Services organization and partner ecosystem (e.g., the PTR Group*) has VxWorks experience and first-hand knowledge of the migration process.

Another excellent source for qualified professional services is the Intel Communications Alliance. For more information on the members, visit www.intel.com/go/ica.

8.3 Other Migration Tools A wide range of other tools exist to ease and accelerate the migration process. These include:

• Source code browsers and analysis tools to help a developer understand the structure of legacy software

• UML parsers/generators that can create high-level representations of legacy source code

• Linux-aware hardware debug and run-control tools to ease board bring-up and device driver debugging

• Off-the-shelf RTOS emulation libraries

• Optimizing compilers, especially Intel’s IA compiler with high gcc compatibility

• Real-time analysis tools to measure latencies and find performance bottlenecks

• Linux ports of familiar middleware and RTOS-based protocol stacks that confer forward compatibility to Linux ports of legacy applications

A developer can find suppliers for these tools and more in the Intel Communications Alliance.

9.0 Conclusion Migration from legacy RTOS-based systems can be very straightforward, or very involved, depending on the particulars of the legacy code base and the “personality” of the legacy RTOS itself. Migration from the industry’s most ubiquitous RTOS, Wind River VxWorks, occurs most frequently, and as a consequence is the best known and clearest trail. Whatever the challenges faced during migration, the result of a successful move to embedded Linux on IA from legacy software and hardware can provide compelling benefits in performance, interoperability, cost and time-to-market savings.

References

Dietrich, Sven-Thorsten. “The Rise of Real-time Linux: Hanging on the Telephone.” Linux User and Developer. Number 50, 2005.

Lanfear, Chris. “Linux Adoption Factors.” Embedded Linux: Coming soon to a Device Near You. Ziff-Davis eSeminar, May 2005

Lanfear, Chris. Real-time Requirements Satisfied by Linux for Respondents Using Linux. VDC, 2005.

Weinberg, William. “Porting RTOS Device Drivers to Embedded Linux.” Linux Journal. October 2004.

White, Brandon. “Linux 2.6: A Breakthrough for Embedded Systems.” LinuxDevices.com. September, 2003.

† Hyper-Threading Technology requires a computer system with an Intel® Pentium® 4 processor supporting Hyper-Threading Technology and an HT Technology- enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software used. See www.intel.com/info/hyperthreading for more information, including details on which processors support HT Technology. §Intel® EM64T requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel EM64T. Processor will not operate (including 32-bit operation) without an Intel EM64T-enabled BIOS. Performance will vary depending on your hardware and software configurations. See www.intel.com/info/em64t for more information including details on which processors support Intel EM64T or consult with your system vendor for more information. AdvancedTCA and the AdvancedTCA logo are the registered trademarks of the PCI Industrial Computers Manufacturers Group*. Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel® products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel may make changes to specifications and product descriptions at any time, without notice. Information regarding third party products is provided solely for educational purposes. Intel is not responsible for the performance or support of third party products and does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of these devices or products. Intel, Intel Pentium and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2005 Intel Corporation. All rights reserved. 0805/KSC/QUA/PDF C Please Recycle 309103–001US