Universidad Tecnica´ Federico Santa Mar´ıa Departamento de Informatica´ Valpara´ıso – Chile

VERIFYING REAL-TIME PERIODIC PROPERTIES ON THE ALMA COMMON SOFTWARE TIME SYSTEM

Tesis presentada como requerimiento parcial para optar al grado acad´emico de

MAGISTER EN CIENCIAS DE LA INGENIER´IA INFORMATICA´ y al t´ıtuloprofesional de INGENIERO CIVIL EN INFORMATICA´ por Mauricio Alejandro Araya L´opez Comisi´onEvaluadora: Horst H. von Brand (Gu´ıa,UTFSM) Ra´ulMonge (UTFSM) Tzu-Chiang Shen (ALMA)

MAY 2008

ii Universidad Tecnica´ Federico Santa Mar´ıa Departamento de Informatica´ Valpara´ıso – Chile

TITULO DE LA TESIS: VERIFYING REAL-TIME PERIODIC PROPERTIES ON THE ALMA COMMON SOFTWARE TIME SYSTEM

AUTOR: MAURICIO ALEJANDRO ARAYA LOPEZ´

Tesis presentada como requerimiento parcial para optar al grado acad´emico de Magister en Ciencias de la Ingenier´ıaInform´atica y al t´ıtuloprofesional de Ingeniero Civil en Inform´atica de la Universidad T´ecnicaFederico Santa Mar´ıa.

Horst H. von Brand Profesor Gu´ıa

Ra´ulMonge Profesor Correferente

Tzu-Chiang Shen Profesor Externo

May 2008. Valpara´ıso,Chile.

ii I love deadlines. I like the whooshing sound they make as they fly by. — Douglas Adams

iii Acknowledgments

La presente t´esisno es s´olom´ıa, sino de todos los que recorrieron a mi lado esta etapa universitaria. Muchos me apoyaron en estos a˜nos, y en esta breve rese˜nam´asde alguno se me escapar´a. Ergo, quiero agradecerle primero a todos los que, sin quererlo, olvidar´e mencionar. Mis Padres financiaron mi carrera, soportaron mis malos ´animos y me apoyaron cuando fue necesario. Sin embargo, mi agradecimiento va m´asall´a. Mi formaci´onprofesional est´a profundamente marcada por el ejemplo de dos sobresalientes profesionales, que sin dejar la familia de lado, han llegado a ser autoridad en sus disciplinas sin la necesidad de hacerle da˜noa nadie. Pero probablemente lo que m´asagradezco son los valores y virtudes que me inculcaron para enfrentar este mundo con alegr´ıa,bondad y entusiasmo. Gr´acias Pap´ay Mam´a. A mis hermanos Carolina y Daniel, les quiero agradecer su preocupaci´on,apoyo y amistad durante todos estos a˜nos. Ustedes pavimentaron mi carrera universitaria, no s´olocon el ejemplo, sino mediante sus consejos y experiencia. Pueden quedarse tranquilos que La Fabula de los Tres Hermanos no se hizo realidad con nosotros. Para mi polola Carolina creo que los gracias no bastan, ya que fue la que lidi´ocon lo m´as complicado de mi tesis: mi mal humor. Gracias por aguantar todos mis malos momentos, mi ´estresy mis negativas. Gracias por presionarme justo cuando lo necesitaba, ya que sin ti a´unno terminar´ıaesta t´esis.Gracias por leer, revisar y escuchar mi t´esis,d´andomeconsejos con sinceridad, bondad y afecto. Gracias por apoyarme para durante todo este proceso, pero por sobretodo, gracias por todo el amor que me entregaste durante este a˜no. Muchas gracias al Dr. Horst von Brand, por ser mi maestro, tutor y amigo; un verdadero mentor acad´emicoque me ense˜n´oque la rigurosidad de la universidad no es incompatible con la alegr´ıade hacer lo que a uno le apasiona. Adem´as,quiero agradecerle por permitirme ser parte de todas aquellas instancias que fueron mi verdadera formaci´onuniversitaria: Lab- Comp, uSCI, ayudant´ıas,Encuentro Linux, CSRG y la confianza para que dictara clases en esta ´ultimaetapa.

iv Se extiende adem´asel agradecimiento al resto de la comisi´onde t´esis,Mag. Tzu Chiang- Shen y Dr. Ra´ulMonge, por darse el tiempo de leer adecuadamente mi trabajo y aportar con las correcciones pertinentes. Este trabajo no hubiera sido posible sin el apoyo del Departamento de Inform´atica UTFSM y el Proyecto ALMA mediante sus instituciones ESO y NRAO. En particular, se extiende el agradecimiento al proyecto ALMA-CONICYT #3106008 y al Programa de Iniciaci´onCient´ıficade la Escuela de Graduados. La investigaci´onde esta tesis fue desarrollada ´ıntegramente entre el equipo de Computer System Research Group y el ALMA-UTFSM Group, por lo que m´asque agradecer a las siguientes personas, deseo hacerlas parte oficial de los resultados obtenidos. Los co-autores de la presente investigaci´onson Jorge Avarias, Claudio Salazar, Rodrigo Tobar, Yonathan Dossow y Horst von Brand. Adem´as,valoro a sobremanera el tiempo que dedicaron los ingenieros de ALMA para entrenarme, donde destacan Bogdan Jeram, Thomas Juerges, Gianluca Chiozzi, Jorge Ibsen, Nicol´asTroncoso, Nicol´asBarriga y Heiko Sommer. Un agradecimiento muy especial a Jorge Ibsen y Flavio Gutierrez, por siempre confiar en nosotros y dar pie a la maravillosa colaboraci´onen la cual se enmarca esta t´esis. Se extiende el agradecimiento a todo el Software Team de La Silla Observatory 2004, incluyendo obviamente a Tzu, Bernhard, Ruben y Percy. Deseo agradecer a mis compa˜nerosde CSRG, Matias, Toby, Javarias, Joao, Alejandro, Norman, Tom´as,Yonathan, Carlos, Ntroncos, Nbarriga y Raraya. Agradezco el tiempo y paciencia del Dr. Salinas, Dr. Hoffmann, Profesora Reyes, Profesor Ca˜nas,Pabla, Lidia, Don Ignacio, Carolina y Yanina. Tambien, agradezco a mis compa˜nerosde curso Carlitos, Bonva, Paola, LuchoX, Willy, Secho, Coke, Jorjazo, Fidel, Joel, Oso e Igor. Como no agradecer a todos los grandes amigos de la Universidad, como Diego, Andrea, Nicol, Cors, Cuico, Cumsille, Juanito, Verito, Yigo, Tulio, Marito, Sapo, Penny, Mave, Gnecco, Jota, Juan, Basterrica, Mito, Charlie, entre otros. Por ´ultimoun saludo muy especial al Negro, Guidi, Pablo, Ton, Berni, Ivo, mis t´ıos, primos y cu˜nados.

Valpara´ıso,Chile Mauricio Araya 1 de Julio, 2008

v Resumen

ALMA Common Software (ACS) es el framework distribuido en el cual las aplicaciones de control de ALMA est´ansiendo desarrolladas. Lamentablemente, para aplicaciones de tiempo real, varios subsistemas no est´anusando el framework, debido a que ACS no provee soporte expl´ıcitopara operaciones con temporalidad cr´ıtica.Sin embargo, ACS est´aconstruido sobre un middleware con soporte de tiempo real, por lo que no se descarta la posibilidad de que el framework cumpla con ciertas caracter´ısticasde tiempo real. Para poder utilizar estas caracter´ısticasdel middleware, es necesario que este ´ultimocorra sobre un sistema operativo de tiempo real. Debido a esto, la presente tesis explora algunos sistemas operativos de tiempo real alterativos donde ACS podr´ıacorrer, desarrollando el soporte cuando fuere necesario y explicando las dificultades en cada caso. El problema es que no existe un m´etodo de evaluaci´oncomparativa de alto nivel aplicable a ACS, por lo que no se puede dilucidar qu´eplataforma tiene un mejor comportamiento de tiempo real. En este contexto, la presente tesis propone un modelo de verificaci´on emp´ırica para sistemas complejos de tiempo real, el cual se enfoca en los resultados de las realizaciones del sistema, en vez de en la formalizaci´onde los elementos que lo componen. El objetivo es aplicar el modelo en sistemas complejos donde la verificaci´onformal se torna engorrosa y lenta. Si bien el modelo est´abasado en una especificaci´onformal de requerimientos, la verificaci´onse realiza emp´ıricamente mediante la ejecuci´onreiterada del sistema. Este modelo est´abasado en la s´ıntesis de requerimientos peri´odicosde tiempo real, e incluye patrones t´ıpicosde la evaluaci´oncomparativa como concurrencia y relojes distribuidos. El modelo se aplic´oal subsistema de tiempo de ACS, el cual es el servicio que controla los temporizadores y relojes del framework. Se seleccionaron los valores de los par´ametros bas´andoseen requerimientos reales de ALMA, para obtener resultados significativos. Los resultados muestran que el subsistema de tiempo de ACS no provee soporte para tiempo

vi real, por lo que se presentan las directrices generales para la construcci´onde un servicio de tiempo real para ACS como conclusi´on. Palabras Claves: Tiempo Real, Verificaci´onEmp´ırica,L´ogicaTemporal, ACS, ALMA

vii Abstract

The ALMA Common Software is a distributed framework on which the ALMA software is being developed. But several ALMA subsystems are not using the framework for real- time software, because ACS does not provide explicit support for time critical operations. Regardless, ACS is built over a real-time middleware, so there is a chance that the system behaves with real-time performance. But this middleware must run over a real-time to be able to provide real- time support. Therefore, this thesis explores some real-time operating system alternatives on which ACS may run, developing the support when needed and explaining the difficulties of each port. The problem is that there is no high-level real-time benchmark that can be applied to ACS, so there is no way of comparing ACS running over two different platforms in terms of the real-time behavior. Therefore, this thesis presents a model of empirical verification for complex real-time systems, which is focused on the results of the realizations of a system, rather than a formal verification of the system constitution. This model is applicable to complex systems where formal verification techniques are too hard to apply. Although the model is based on the formal specification of requirements, the verification is done empirically by executing the system several times. This model is based on synthesized periodic real- time requirements and includes common benchmarking patterns such as concurrency and distributed clocks. The model was applied to the ACS Time System, which is the ACS service that controls the timers and clocks of the distributed framework. To stress the system correctly, real world ALMA requirements where selected to set up the parameters of the model. The results show that ACS lacks a time system with real-time support, so general guidelines for building an ACS real-time service are presented in the conclusions of this work.

viii Keywords: Real-Time, Empirical Verification, Temporal Logic, ALMA, ACS

ix Index of Contents

Acknowledgments iv

Resumen vi

Abstract viii

Index of Contents x

Index of Tables xiii

Index of Figures xiv

1 Introduction 1 1.1 Real-Time and Astronomical Instrumentation ...... 1 1.2 Real-Time and the ALMA Common Software ...... 2 1.3 Thesis Structure ...... 3

2 State of the Art 5 2.1 Real-Time Systems ...... 5 2.1.1 Hard and Soft Real-Time ...... 6 2.1.2 Deadline Awareness ...... 6 2.1.3 Embedded Systems ...... 7 2.1.4 Real-Time Programming ...... 7 2.1.5 Real-Time Operating Systems ...... 8 2.1.6 CORBA and Real-Time ...... 11 2.1.7 The ACE ORB ...... 12 2.2 ALMA Common Software ...... 13 2.2.1 ACS Architecture ...... 14 2.2.2 The Component/Container Model ...... 15 2.2.3 The ACS Time System ...... 17 2.2.4 ACS Operating System Support ...... 17 2.3 ALMA Control Requirements ...... 17

x 2.3.1 The ALMA Monitor and Control Bus ...... 18 2.3.2 The AMB Server and Manager ...... 19 2.4 Real-Time Requirement Verification ...... 20 2.4.1 Real-Time Benchmarks ...... 21 2.4.2 Real-Time Requirement Specification ...... 22 2.4.3 A Semi-Formal Specification: Time Utility Function ...... 22 2.4.4 A Formal Specification: Temporal Logics ...... 24

3 Periodic Requirement Synthesis 28 3.1 Types of Real-Time Requirements ...... 29 3.1.1 Periodic Requirement Formulas ...... 29 3.2 Verification Values ...... 32 3.2.1 Hard Real-Time Verification: The Jitter Value ...... 32 3.2.2 Soft Real-Time Verification: The MSE value ...... 35 3.3 Clock Drifts ...... 36

4 Empirical Verification Model 39 4.1 Test Requirement Specification ...... 40 4.1.1 Obtaining a Timed State Sequence ...... 40 4.1.2 Repetitions ...... 40 4.2 Model Usage ...... 41 4.2.1 External Verification Extension ...... 42 4.2.2 Concurrent Tasks Extension ...... 43 4.3 An Example ...... 43 4.3.1 Hardware Configuration ...... 43 4.3.2 A Real-Time POSIX Periodic Task ...... 44 4.3.3 External Verification ...... 45 4.3.4 Requirement Frequencies and Repetitions ...... 45 4.3.5 Results ...... 46

5 ACS Time System Verification 53 5.1 Real-Time Operating System Support ...... 53 5.1.1 The ACS ++ Runtime Environment ...... 54 5.1.2 Real-Time Operating System Candidates ...... 55 5.2 Simulating ALMA Requirements ...... 56 5.2.1 Requirement Simulation ...... 57 5.3 Experimental Setup ...... 58 5.3.1 The ACS Client Test Program ...... 58 5.3.2 Concurrent Task Extension ...... 59 5.3.3 External Reader Extension ...... 59 5.4 Results ...... 59 5.4.1 Local Verification ...... 59

xi 5.4.2 Local Verification with Concurrency ...... 62 5.4.3 Clock Drift Measurement ...... 63 5.4.4 External Verification ...... 65

6 Conclusions and Future Work 66 6.1 The Verification Model ...... 66 6.2 ACS Real-Time Review ...... 67 6.3 ACS Time System Verification ...... 68

Bibliography 69

xii List of Tables

4.1 Number of repetitions for each test run ...... 46 √ 4.2 Maximal Jitter and average MSE values calculated for both operating sys- tems for θ property verification ...... 47 √ 4.3 Maximal Jitter and average MSE values calculated for both operating sys- tems for φ property verification ...... 48 4.4 Jitter and MSE values corrected to the reference domain using linear regres- sion for θ property...... 49

4.5 Corrected reader’s maximal jitter and MSE values by DExt→Sys(τ) transfor- mation for θ requirement ...... 51

4.6 Corrected reader’s maximal jitter and MSE values by DExt→Sys(τ) transfor- mation for φ requirement ...... 52

5.1 Local verification using the CLOCK component timestamps ...... 61 5.2 Local verification using the clock gettime() timestamps ...... 61 5.3 Local Verification with 8 harmonic/non-harmonic tasks ...... 62 5.4 External Verification Results ...... 63

xiii List of Figures

2.1 Portable Object Adapter ...... 12 2.2 The ACE ORB ...... 13 2.3 ACS Package Diagram ...... 14 2.4 ACS Container/Component Model ...... 16 2.5 The ALMA Control Physical Diagram ...... 18 2.6 Time Event Signal ...... 19 2.7 Time Event Sections ...... 20 2.8 Degree of Timing Precision ...... 23 2.9 Time Utility Function ...... 24

3.1 Strict and Relative Periodic Realizations ...... 30

4.1 The Physical Test Machine Deployment ...... 45 4.2 Jitter Behaviour and Transformation ...... 50

5.1 ACS C++ Real-Time Environment Block Diagram ...... 54 5.2 Physical Simulation Diagram...... 58 5.3 Local Machine Setup ...... 60 5.4 Clock Drift Behavior With and Without Tick Coordination ...... 64

xiv Chapter 1

Introduction

Real-Time Systems are spread everywhere nowadays, from specific-purpose embedded or dedicated systems to very large distributed controlling frameworks. However, the common knowledge about these systems is very limited, and sometimes the term is incorrectly or imprecisely used. The real-time concept, because of its colloquial use, commonly produces misgivings. The word real its often confused with fast or in user time. Even more, in the early days of the real-time term was used to describe interactive or time-sharing systems [Mar65]. Currently, the term is used to describe systems on which the time constraints are very well defined, the deadlines should be precisely met, and the system behaviour is deterministic. There is no direct relation with how quickly a computa- tion is delivered, because an early delivery can be as ruinous as a delayed response. Also, there is no relation with user-time psychological perception, because real-time systems are used for controlling critical or embedded systems, with machine timing constraints that are incompatible with interactive systems.

1.1 Real-Time and Astronomical Instrumentation

Real-time systems are not theoretical laboratory constructions, but systems with properties that the real world needs. Most of the advances in this area came from the automotive, avionics, particle accelerator and astronomical instrumentation applications. In this last application area, real-time systems have been widely used to control telescope mounts and other critical hardware devices, but each observatory (and even each telescope) has it own

1 custom real-time software. This problem extends to general astronomical software develop- ment, because each instrument of each telescope uses its own custom software. Therefore, a major trend in astronomical instrumentation software is to provide generic software patterns and applications to be used for several telescopes and related hardware. The idea of building a common framework to reuse and interconnect astronomical software has been explored independently by several organizations in the past decade. For instance, the European Southern Observatory has developed the Very Large Telescope Common Soft- ware [P.S] to control the four 8-meter optical telescopes at the Paranal Observatory. The Gemini project, in the other side, has reused the EPICS [DKK] framework developed for particle accelerator applications.

1.2 Real-Time and the ALMA Common Software

The ALMA project [Tar08], which is a joint venture between astronomical organizations in Europe (ESO), North America (NRAO) and Japan (NAOJ), will consist of at least fifty 12- meter antennas operating together to explore the millimeter and sub-millimeter wavelength range. Besides the software challenges of controlling and coordinating the antennas remotely, this project is the perfect opportunity to gather the common patterns and algorithms of different software cultures and approaches, towards a generic framework for astronomical applications. This challenge has been assigned to the ALMA Common Software (ACS) Team, which is in charge of building the generic framework on which the whole ALMA software is to be built. After seven years of development, ACS has become a robust, generic and distributed framework for control, that can be used for non astronomical applications. During the past months, ACS has been tested on prototype antennas and specific hard- ware that will be part of the ALMA Observatory. In general terms, the framework fulfills the desired requirements, but some specific subsystems are using work-arounds due the lack of real time component/container support and test. ACS developers have decided not to support a direct container implementation for Real-Time Operating Systems (RTOS), since there is very little research on this specific issue, and thus, the underlying risk is too high for software that soon will be officially deployed on production antennas. Unfortunately, these work-around solutions break the system consistency, making the distributed coordination

2 even more difficult, including debugging and logging. This problem affects not only isolated real-time components; real-time communication with other real-time components currently can not be done using ACS services. Regardless, ACS C++ containers and services are strictly developed over the ACE/TAO middleware, so the framework may be able to fulfill real time requirements, at least for C++, if a proper real-time operating system is set up. Although ALMA has decided not to support hard or soft real-time periodic requirements, some other large projects that are currently evaluating ACS, such as the ESO-Extremely Large Telescope (E-ELT), could be interested in this support. The main problem is that there are no quantitative values to determine if ACS services support a hard real-time periodic requirement; neither is a quality of service value for soft real-time service performance known. Perhaps ACS services are able to fulfill the current ALMA requirements (if the proper real-time operating system is selected and configured), but due to lack of verification, ALMA engineers prefer proven work-around solutions such as independent RTAI Linux kernel modules to do the real-time control. The continued use of RTAI for real-time tasks is currently a central debate in the direction real-time support in ALMA should take, as its development lags behind the Red Hat Enterprise Linux platform on which most of ALMA runs, but on the other hand, native real-time support in the Linux kernel is still inmature [Pis08]. This thesis develops an empirical real-time verification model applied to the ALMA Com- mon Software, and uses it to check if ALMA could use ACS for their real-time requirements. For this purpose, the Maximum Jitter and the Mean Squared Error are used as generic mea- surement values for any real-time system, applied using an Empirical Verification Model. The idea is to provide comparative metrics to stress the possible operating systems for ACS and the ALMA software.

1.3 Thesis Structure

This thesis is organized as follows. Chapter 2 presents basic real-time theory, describes the tools and software that will be used and a state of art of real-time verification and specification. Chapter 3 introduces the periodic real-time requirements and synthesizes the general specification into formulas and values that can be measured. Using these results,

3 chapter 4 presents an empirical verification model for periodic real-time properties, giving an example of its usage. Then this model is fully applied to the ACS Time System in chapter 5, where the main experimental results of this thesis are shown. Finally, the conclusions and suggestions for future work are presented in chapter 6.

4 Chapter 2

State of the Art

2.1 Real-Time Systems

Real-time systems are those in which the time correctness is as important as the logical correctness [Sta96]. This means that it is not sufficient to produce a correct answer to a given computation, but it is required to deliver the result with the correct timing [Abb06]. If the response is delivered too late, or in some cases too soon, the requestor may consider the lack of timing as a failure. To successfully meet timeliness requirements, the system behavior must be predictable at any moment, and the time of each operation must be deterministic. In general, the following design challenges should always be considered when building a real-time system [LB94]:

• Predictability: Well-defined, or deterministic operations with timing independent of the surrounding environment.

• User Control: The user () has ultimate control of the behavior of the system. For example, the user can change the scheduling policy dynamically, can change the priorities of the real-time tasks, and adjust the general throughput.

• Timeliness: The temporal correctness must be ensured in a real-time system, so the system must provide mechanisms which take this issue in account.

• Mission Oriented: A real-time system has multiple tasks, including system mainte- nance, desktop applications, etc. But during any given time period, a real-time system

5 must have only one (or very few) critical operations with another status: Very high priority tasks with a direct access to system hardware and peripherals [SGG04].

2.1.1 Hard and Soft Real-Time

When the time constraints are so important that a catastrophic event occurs if the time window is not met, the system has hard real-time requirements. These systems should be built in a deterministic fashion to ensure deadline compliance [Sta96]. For instance, a military defense system cannot miss a time deadline in a missile trajectory calculation, because the consequences could indeed be deadly. Possible catastrophic events include not only life and death situations, but also heavy economical losses, whole system collapse or loosing unique opportunities. There are other kinds of real-time systems that have less critical constraints called soft real-time systems. These systems execute the real-time tasks according to a desired schedule “on average”, but with deadlines. A classic example of a soft real-time system is a live video transfer protocol over a network. Each frame should be sent before a clear deadline, but a frame drop is not a catastrophic event, and maybe the user will not detect the fault. But, if the system misses too many deadlines in a short time period, the “on average” restriction is not met, and the video will stall. The measurement of the missed deadlines of a soft real-time system, is often called Quality of Service (QoS) [Sie06]. Some authors [DM03] state that there is no precise division line between hard and soft real time, because there is no such thing as a 100% deterministic CPU, so all real-time systems have to satisfy an “on average” restriction. The hardness or softness of a real-time system will be then be a fault tolerance level and not a tight classification of systems.

2.1.2 Deadline Awareness

The concept that defines a real-time system is the deadline awareness, understanding a deadline as a critical time point when the conditions of the system or environment change if the proper response has not been sent. In the general design of computer systems, deadlines can be slightly violated as long as the violations are not too much over a certain period, because systems can usually continue, restart or at least roll-back to a recovery point [Kim00]. In hard real-time systems, there is no such recovery or restart from a deadline violation. Even so, real-time systems are still exposed to hardware or other failures, so the system not only

6 has to deliver the normal operation results at the correct timing, but do so even when the hardware (or other component) fails. Thus, a hard real-time system should always have fault tolerance mechanisms to avoid deadlines violations at any cost. To do so, the system has to be aware of the deadlines and handle them in a proper way.

2.1.3 Embedded Systems

An embedded system is a purpose-specific computer system that is part of a larger sys- tem [BM05]. Embedded systems are semi-autonomous entities in a complex system, that take care of specific hardware or human interfaces with very few responsibilities. A complex system should be understood as a system that manages many variables at the same time, such as a nuclear power plant, an intelligent vehicle highway system, an air-traffic control system, or an antenna array. For instance, an embedded system of an antenna array can be a local control unit (LCU) that controls the axis encoders and motors of one specific antenna. Embedded systems usually are considered as small, resource limited, and purpose-specific systems, but the increasing sophistication of microprocessors and memory devices is allow- ing to design configurable and programmable embedded systems as generic reallocable units rather than specifically manufactured units. As embedded systems usually control hardware, most of them are real-time systems, mainly because of the timeliness restrictions and the critical operations that special hardware manipulation involves. In fact, there is a strong bond between both concepts; embedded operating systems usually support real-time require- ments, and moreover embedded hardware is usually designed to fulfill real-time requirements. The ongoing tendency of using off-the-shelf embedded systems to perform real-time tasks opens the problem of developing distributed real-time communication [DM03].

2.1.4 Real-Time Programming

The bibliography of programming with real-time requirements is often related to a hardware- specific task [BM00] or closely embedded at the application area [Yam04]. Due to this fact, it is very difficult to distinguish between general contributions and system specific achievements. Some efforts were made during the past decades towards establishing gen- eral guidelines for real-time programming, in pursuit of a strong paradigm applicable to

7 several systems. For instance, the very basic concept of priority--driven program- ming [Alv87] is the general method (currently known as preemptiveness) that allows deadline handling using resource expropriation. Another approach is the asynchronous message pass- ing model [KVdR83], that extends the linear time temporal logic for real-time requirements and programming for real-time (see also section 2.4.4). From the practical side, the Ada language [Int95] is the most prolific early contributor for real-time programming, establish- ing a whole list of real-time requirements for ADA 9X [Wel90], including task scheduling, queuing order, asynchronous transfer of control, IPC and synchronization, interrupt entry binding, and time specifications. Modern approaches are more related to software engi- neering than to direct real-time programming, such as using UML for real-time develop- ment [Dou03, SRR03], well defined interfaces for real-time components [TWS06], modular object oriented systems for developing distributed real-time systems [CHL99], high-level component programming and configuration [SJ92], or end-to-end engineering of dynamic real-time systems [Rav02]. For specific details on the C/C++ and Java real time please refer to [Kim04, DM03, FWDC+00, KSK04].

2.1.5 Real-Time Operating Systems

The main difference between a General Purpose Operating System (GPOS) and a Real- Time Operating System (RTOS), is that the former does not support direct control of the microprocessor (neither of peripherals) for timing issues [Lap92]. With a proper real- time operating system, the user (programmer) can develop programs that meet timeliness requirements by controlling, for instance, the scheduling, sleeping deadlines, and buffer flushing. The part of an operating system that controls the process scheduling and the direct access to hardware it is the kernel, so this is the essential part that should be real-time compliant [SGG04]. Therefore, a real-time kernel must provide basic support for predictably satisfying real-time constraints, /process preemptibility, thread/process priority, deterministic synchronization mechanisms, predefined latencies and a specialized API for accessing this functionality. There is a common standard for portable operating systems as a part of the POSIX standard (1003.1) [IEE96]. Its extension for real-time (1003.1b) [IEE94, FWDC+00] in- cludes process primitives, process environment, files/directory access, I/O primitives, device

8 functions, synchronization, , scheduling, clock and timers, and message passing functions. As an example of this standard, in the Clocks and Timers section, the CLOCK REALTIME identifier is defined, specifying a minimum POSIX-compliant resolution of 20 [ms]. Obviously this can be much less in a specific implementation. This standard not only motivates portable applications and compatible real-time operating systems, but provides a research base with practical implications. For instance, Feizabadi et al present a formal verified framework for Real-Time Scheduling manipulation on POSIX compliant real- time operating systems [FLRS04]. Most current real time systems (such as VxWorks [Col] or QNX [Hil92]) implement RT-POSIX. A survey on the most popular commercial POSIX compliant real-time operating systems can be found in [BM05], and the details of early real-time operating systems (before 1994) can be found in [GMS94]. VxWorks is a commercial hard real-time operating system developed and distributed by Wind River Systems. With more than 20 years of experience, VxWorks is the most popular commercial real-time operating system nowadays with more that 350 million deployments, it is widely used in applications such as in automobiles, consumer devices, network switches and routers. Recently, Wind River has received a lot of press because VxWorks is currently used by the two rovers Spirit and Opportunity, that began exploring Mars in 2004. The Wind used in VxWorks is composed only by the minimal features needed for real-time, so the footprint is as small as possible. Other features such as networking, file systems, or memory management can be included for a particular system. In addition to the basic requirements of a real-time operating system, such as preemptive scheduling, high reliability, and stringent performance, VxWorks offers additional benefits like small memory footprint, fast boot time and scalability. VxWorks has adopted most of the RT- POSIX standard, but it also provides a parallel API for more advanced features that are not provided by POSIX. Probably the largest drawback of VxWorks is that it is designed for users expert in real-time and operating systems, so the learning curve is quite hard. Indeed, the whole environment is a C-like environment, including and commands. Any fault in the memory usage will cause a whole system collapse, because there is neither paging nor memory management. Moreover, there is no such thing as independent executables, because the ELF binaries do not have an entry point like main(). Instead, any function symbol can be called from any other function or directly from a shell or script. All these symbols share

9 the same memory space, so should be aware of all the function symbols so as to not duplicate an entry. All these aspects makes VxWorks a very fine-grained real-time operating system, but also a very hostile environment to work in. QNX [Hil92] is a commercial POSIX compliant real-time operating system aimed primar- ily at the embedded systems market, developed and distributed by QNX Software Systems. Recently the source code of the QNX kernel was released for non-commercial use. As a microkernel-based operating system, QNX is based on the idea of running most of the sys- tem in the form of a number of small tasks, known as servers. The QNX kernel contains only CPU scheduling, interprocess communication, interrupt redirection and timers. Everything else runs as a user process, including a special process known as proc which performs process creation and memory management by operating in conjunction with the microkernel. This is made possible by two key mechanisms — subroutine-call type interprocess communication, and a boot loader which can load an im- age containing not only the kernel but any desired collection of user programs and shared libraries. Due to its microkernel architecture, QNX is also a distributed operating system. Neutrino supports symmetric multiprocessing and bound multiprocessing (BMP), which is QNX’s term for being able to lock selected threads to selected CPUs. BMP is used to im- prove cache hitting and to ease the migration of non-SMP safe applications to multi-processor computers. During the past few years, several efforts have been developed towards a hard real time Linux kernel. A good description of the real time requirements and challenges that a real time Linux should consider can be found in [TGFB02]. Probably the best known projects nowadays are RTLinux [BY97, YB] and RTAI [DM03, MDP00]. The first one offers a high- performance hard real time API bypassing the Linux scheduler and interrupt controller, by rewriting the using a special patch to the kernel. The interrupts are managed by special kernel modules designed for low latency. In the case of RTAI, the interrupts are managed by an underlying hard real time nanokernel (ADEOS), and real time tasks have to be written as kernel modules. There is also an user-space API (LXRT) that uses generic RTAI kernel modules that invoke ADEOS with an obvious latency penalty. The main problem of these approaches is RT-POSIX compliance, because standard POSIX calls are taken by the Linux kernel, so a parallel real time API has to be designed. Therefore, the

10 porting of real time applications written originally for a POSIX compliant real-time operating system becomes very complex. A complete new approach is currently under development by the Linux kernel community, called Linux-RT and led by Ingo Molnar [RH07]. This new approach supports kernel , priority inversion, threaded interrupts, hard IRQs, high resolution timers, and a full port of blocking spin-locks to preemptible mutexes. The main advantage is that the system uses the same POSIX compliant API; the drawback is that the project is currently under heavy development, and there are only few benchmark results.

2.1.6 CORBA and Real-Time

CORBA (Common Object Request Broker Architecture) is a vendor-independent architec- ture and infrastructure specification for distributed middleware systems [Obj]. CORBA has been standardized [Obj02] by the Object Management Group (OMG), which provides the common directives to transparently call remote objects and methods. CORBA is not tied to a specific , being a compatible specification for several implementations. This is done by providing an Interface Definition Language (IDL), in which to specify the public interface of an object to be used by a distributed entity. Also, CORBA specifies a mapping from IDL to several programming languages, making the interaction of heterogeneous objects that have been developed in different languages possible. The specification dictates that each application should use a specific language-dependent implementation called an Object Request Broker (ORB). An ORB is a realization of CORBA, providing the namespaces, services and policies to publish an object into the network, and to access other ORBs objects transparently. This is done by having a Portable Object Adapter (POA), that manages the remote calls, and calls local objects through the IDL language bindings. This object adapter is used by the remote callers using compiled IDL stubs, providing the proper bindings to access the remote object in the current programming language of the local ORB (see figure 2.1). Also, the POA architecture serves to cleanly develop distributed objects, because access control, load balancing, synchronization, socket manipulation, and other low-level details are delegated to the POA and ORB. In 1999, a RT-CORBA specification was released by the OMG [Obj03, SK00], to cover the ongoing tendency of using CORBA for real time setups. This specification enhances the

11 Figure 2.1: Portable Object Adapter. existing CORBA towards end-to-end system predictability. The design patterns are very similar to an real-time operating system architecture, such as rate monotonic scheduling, priority inversion, preemptiveness, local resource management, and upper bounds to critical operations, among others. Also, it includes new design patterns for distributed systems, such as priority banding, predictable transport, predictable invocation, and priority propagation.

2.1.7 The ACE ORB

Several open source and commercial ORB implementations have been developed during the past years, but only a few of them support the RT-CORBA specification. The best known RT-ORB nowadays is Douglas Schmidt’s TAO (The ACE ORB) [Schb, Sch99]. This real-time ORB is based on the Adaptive Communication Environment (ACE) library [Scha]. ACE is an open-source object-oriented library that implements concurrent communication patterns across a range of operating system platforms. The idea of ACE is to simplify the development of heterogeneous object-oriented network applications, by providing reusable C++ wrapper facades for each platform, with proven high-performance and real-time support. From here, it is almost natural to think of a RT-ORB implementation for the C++ language using ACE. As can be seen in figure 2.2, the RT-ORB must be setup over a real-time operating system, which must support the RT-POSIX specification to properly use the real-time services. In the case of Linux, TAO works properly only for non real-time tasks, because the RT-POSIX support is a very new feature (currently under heavy development). There is a project called

12 OCEAN [Meo05], that intends to use TAO with RTAI (non RT-POSIX), but the project is currently dead, and is very difficult to install or test.

Figure 2.2: The ACE ORB

TAO is an open-source project, but as the project has matured over the years, a num- ber of companies offer commercial support. This success is caused by the portability of ACE, making TAO the only RT-ORB available for some hardware platforms and real-time operating systems.

2.2 ALMA Common Software

ALMA Common Software (ACS) [C+02, C+04, RCG01] is a software infrastructure for the development of distributed systems based on the Component/Container model [SC03, S+04]. This framework was built to support the complex control requirements of coordinating the ALMA (Atacama Large Millimeter Array) radio-telescopes. ACS also stands for Advanced Control System, as the framework is geared towards any system that requires complex and distributed control [P+02].

13 2.2.1 ACS Architecture

ACS is a set of congruent packages that provides the development tools, the common services, and the patterns needed to successfully build distributed systems. Most of the ACS features are provided using off-the-shelf software, while ACS itself provides the packaging and the glue between them. For instance, ACS is based on the CORBA specification, which provides the whole infrastructure for the exchange of messages between distributed objects, so open source ORBs are used to provide this functionality. Common development tools, such as compilers, interpeters, unit-testing classes, and XML parsers, are also taken from the open source world and consistently integrated as a part of this development framework. These basic tools are the first software package layer of ACS, as figure 2.3 shows.

Figure 2.3: ACS Package Diagram.

The next two layers are the actual code of ACS called Common Software, which is a set of libraries and applications that support the development of distributed applications based on the Component/Container Model (CCM). Currently, ACS supports components written in three major object-oriented languages, C++, Java, and Python. The main advantage of using ACS besides the component/container model, is that each service is implemented for the three supported languages (at least partially). Most of these services are just convenient wrappers of the ORB services, but they are integrated by ACS to provide consistent logging,

14 error propagation, namespaces, event notification, and other services. The used ORBs are TAO [Schb], JacORB [Bro97], OmniORB [Gri] and some Mico [PR00] services. The last layer provides high-level APIs and tools to configure and manage the system deployment and runtime. These tools are non-essential packages, but offer a clear path for the implementation of applications, with the goal of obtaining implicit conformity to design standards and maintainable software.

2.2.2 The Component/Container Model

The ALMA Common Software is structured as components which run inside containers.A container is a runtime entity that loads and manages several portions of software called com- ponents. This model enforces a uniform structure in all subsystems and enables developers to focus on domain problems rather than software engineering issues. Also, it provides a clear division between domain code (components) from the service code (container), such as socket manipulation, system lifecycle, or data channels. The Component/Container Model goes beyond the original CORBA idea, so a Com- ponent/Container implementation was needed to fulfill the ALMA requirements. Unfortu- nately, back in the year 2000 when ACS was first designed, no off-the-shelf implementations where available, so a custom component/container implementation was developed by ACS over the standard CORBA capabilities. Motivated by successful component frameworks such as Sun’s EJB or Microsoft’s COM+, the Object Management Group has recognized the ad- vantages of a standardized container model, and produced the CCM specification based on CORBA. The ACS Container/Component Model has been refactored to include OMG’s ter- minology and concepts, and currently the ACS Development Group is investigating whether providing binary-level compatibility with CCM and J2EE is worthwhile.

The ACS Manager

The runtime entities of ACS are containers and clients that interact with each other us- ing the CORBA specification and the specific ORB implementation. Clients are end-user applications designed to interact with components, while containers are designed to host components and manage their lifecycles. A deployment of a complex system can include lots of containers and probably several clients, so a global coordination service is needed to

15 locate the components, manage the data channels, and control the life cycle of the compo- nents and containers. This coordination service is called Manager, and behaves conforming to a centralized Configuration DataBase (CDB), that is previously filled at the deployment stage.

Component Docking

Figure 2.4: ACS Container/Component Model

A component is an instance of a class that implements an IDL functional interface and extends a generic life-cycle implementation to be docked in a container. As shown in fig- ure 2.4, the life-cycle interface provides (among others) the basic methods init, run and shutdown, which are inherited from a basic ACSComponent class provided by the framework. The functional interface is the public methods that the component provides to be used by other components or clients. This is defined through a CORBA IDL file, so other runtime en- tities can use this component distributedly and using any supported programming language. In figure 2.4, the IDL provides the observe and readValue methods. Also, there is a ser- vice interface that provides to the container the ACS services through the container. These services are wrappers of the ORB services, information from other components through the Manager and CDB, or other ACS services like a centralized logger.

16 2.2.3 The ACS Time System

The ACS Time System consists of time conversion helper classes and two C++ ACS com- ponents: The TIMER component, and the CLOCK component. An ACS C++ Component is a shared library written in C++, with a general CORBA interface to be loaded and run in a generic C++ container (lifecycle interface). The TIMER functional interface provides a method to register a callback in the client program-space, and it is called from the (possibly remote) component. This callback can be a “one-shot” call, or a periodic call at a given interval. The CLOCK component provides several time conversions and global time for the distributed system.

2.2.4 ACS Operating System Support

As ACS is based on basic off-the-shelf software (the first layer in figure 2.3), the operating system support depends on the portability of these basic packages. ACS uses a specific version of each tool, so if these versions can be installed on an operating system, ACS can be build there. But installing the basic tools is not an easy task, because they commonly produce conflicts with the tools provided by the system. Currently, ACS is officially supported only on two specific Linux distributions, RHEL 4.x (Red Hat Enterprise Linux) and SL 4.1 (Scientific Linux). In the past, ACS was supported on Solaris and Windows, but this support was dropped due to lack of time and clients. The case of VxWorks is much more complex, because currently there are VxWorks users, but the support for this platform is broken. This is caused mainly by the ACE+TAO VxWorks support, which has been very unstable during the past few years. Some other Linux distributions have been tested by the ACS-UTFSM Group and other users, including Debian, Ubuntu, Arch and Fedora. A detailed description of the porting efforts of the ACS-UTFSM Group can be found in [Ava07].

2.3 ALMA Control Requirements

The ALMA Control Software Subsystem handles the control and coordination of the move- ment of the antennas. Therefore, this subsystem is in direct contact with the hardware of the antenna mounts and related devices. Also, this subsystem must communicate the antennas with the other parts of the ALMA software distributedly, mixing two different worlds: The

17 real-time control of the devices and the distributed component/container model on which ALMA software is constructed.

2.3.1 The ALMA Monitor and Control Bus

The ALMA Monitor and Control Bus (AMB), is the networking system for ALMA electron- ics, providing monitor and control capabilities from a master node called the Antenna Bus Master (ABM). This bus is a superset of CAN, which is a protocol and bus standard designed to allow microcontrollers and devices to communicate with each other and without a host computer (see figure 2.5).

Figure 2.5: The ALMA Control Physical Diagram

The CAN version of ALMA works at 1 Mbps, and includes two other pairs of wires (differential), carrying a reset and time event signal. The reset signal it is usually false (logic 0), and becomes true when the system requires a full reset of all devices. The time event signal, in the other side, is usually true (logic 1) and drives the bus to a quiescent false state

18 periodically with a duty cycle of 12.5% of the time event period. This time event period is specified to be 48 [ms], so the false state remains for 6 [ms] for each period as shown on figure 2.6.

Figure 2.6: Time Event Signal

The AMB can transmit only two types of messages: The control signals, and the monitor signals. The control signals are meant to provide commands for hardware devices, meanwhile the monitor signals request a specific device state. Devices must begin the transmission of the responses within 150 [µs] of receiving a monitor request. Some of these messages are time critical, which means that they must be delivered in a time window that is defined in terms of a time event (TE). A TE-related control signal must not be sent earlier than the occurrence of a TE, and the system must complete the transmission no later than 24 [ms] after that TE. In the case of a TE-related monitor signal, the system may begin transmission no earlier than 24 [ms] after a TE, and it must complete the transmission no later than 4 [ms] before the next TE. This means that the time event period is split in three sections: The time window for real-time control signals, the time window for the real-time monitor requests, and a quiet 4 [ms] just for waiting for the next TE. These sections are described in figure 2.7.

2.3.2 The AMB Server and Manager

As ACS does not support a real-time container to run real-time components directly, the ALMA Control team was forced to seek other proven alternatives to deliver TE-related

19 Figure 2.7: Time Event Sections

messages in real-time. They decided to use RTAI Linux kernel modules to met the real-time requirements, and use the mechanisms that this API provides to connect ACS components with these real-time kernel modules. The ABM is the master node of the CAN Bus that controls devices on the network. This machine contains two major software entities that serve to connect the ALMA software to the CAN Bus. The first one is the AMBManager, which is the C++ component in charge of sending the monitor requests/controls and receiving the results from monitors, to/from special queues designed for real-time (using RTAI). This component is invoked by the software components which resemble the hardware devices on the CAN bus. The other side is the AMBServer, which is a kernel-space module that uses the RTAI API to control the real-time queues, dispatching the items of each queue at the right time and in the right order.

2.4 Real-Time Requirement Verification

There is not much research on empirical verification of real-time systems, because verifica- tion is usually seen as a theoretical problem that validates a formal syntax and semantics of a specification language. This type of formal verification is usually done using timed au- tomata [BSV94, Bou08], statistical simulation [BJMO06], general theorem proving [Seg95],

20 and program/code inspection [NJ83]. Another approach is the one this thesis explores, which is an empirical verification method to verify real world system executions, rather than the formalization of the system internals and the analysis of the resulting formal models. Fill- ing the need for practical verification methods is still an open problem [Hei98], because the link between formal specification languages and engineering is still weak, and perhaps the real-time domain is the best example.

2.4.1 Real-Time Benchmarks

The closest reference to empirical verification are real-time benchmarks, understanding a benchmark as a set of tests and values used for comparing systems. In real-time, the de- sired values depend on the requirements, but in general they measure how predictable the responses are rather than how fast they can do the task. One of the first real-time bench- marks was Hartstone [WDS90], for high-level Ada requirements. Simultaneously, Rheal- stone [Rab90] was developed to test low level requirements. These benchmarks are still the most commonly used, including several modifications for distributed systems and object- oriented programming [Dru96, FMW+00]. Both benchmarks are synthetic, which means that rather than performing useful operations, they use a predefined scientific numeric com- putation that is bounded and predictable. This idea is derived from the Whetstone bench- mark [CW76], so the operations of the computations are measured in Kilo-Whetstones, which is like one thousand floating-point operations. Hartstone proposes five test series of increasing complexity, which are modification of the PH or PN series. The PH series stands for Periodic Tasks, Harmonic Frequencies, whose objective is to provide test requirements with a set of concurrent tasks that are periodic and harmonic. This means that each task is executed at regular intervals, and the frequency of each task is an integer multiple of all tasks with lower frequencies. The PN series are periodic tasks, but with non-harmonic frequencies, such as physical phenomena frequencies, or generated frequencies from a complex mathematical formula. Hartstone provides some values for the PH series frequencies and loads (Whetstones), but they are only examples of the usage of Hartstone, and hardly applicable to newer and faster systems.

21 2.4.2 Real-Time Requirement Specification

A property can be seen as an instance of a requirement, so the requirement specification is the first step of any verification. Real-time requirements can be specified in several ways:

• Informally: A natural language prose that describes the requirement in terms of the software needs, using qualitative statements and some quantitative values to describe deadlines and conditions.

• Formally: A precise description, often with a hard mathematical or logical base, which implies a deep knowledge of the requirement nature, values and restrictions. Engineers often avoid formal specifications, due to the fact that formal approaches involve extra work that can be seen obvious or sometimes even useless.

• Semi-formally: A relaxed version of formal specifications, that abuse the notes and labels. These methods are powerful tools to communicate ideas, usually used to for- malize interactions, but keep the details in the prose paradigm. Structured analy- sis [MW85, MW86b, MW86a] and statecharts [Har87], are the most known semiformal methods used to describe real-time systems, because they can be easily adapted to timing constraints with only a few notes.

For a general overview of formal and semi-formal methods, please refer to [HM96]. Semi-formal approaches are the most popular specification methods among engineers. Alas, to verify a property, the underlying requirement must be decidable, so a formal re- quirement specification is needed to ensure that the requirement can be verified.

2.4.3 A Semi-Formal Specification: Time Utility Function

Systems always have a tolerance for deadlines, given by the requirements nature or by the controlled hardware properties [Kim00]. Even hard real-time systems use digital clocks, which introduce a basic tolerance degree bounded by the clock step. Therefore, real-time systems can be classified by the degree of timing precision on which they work [Kim04] (see figure 2.8). This quantity helps to select the proper hardware and software for a given problem. For instance, if the timing precision of a problem is near a minute, almost any desktop PC will be able to handle it, even over a very slow network. But if the requirements

22 Figure 2.8: Degree of Timing Precision Example. Here d values are the specified deadlines, and t values are when the system actually responds. For instance, ti is not in the DTP specified for the deadline dj. are near microseconds, most general-purpose PCs are not capable of that resolution with accuracy. An interesting method to explicitly describe the deadlines is the Time Utility Function (TUF) [JLT85, RJL05]. This method is based on the utility of a response for a given deadline, in terms of the time difference between the requested and real instants. The idea is that the deadlines can be represented by a function that describes how useful a given response is in terms of the time distance from the deadline. The time/service utility function [Sie06] is + a function from R −→ [0, 1], that transforms the time ti of an event to an utility value u. This value helps to be aware of the deadline drifts and penalizes late (or early) responses. + For instance, a possible TUF for D ⊂ R a set of deadlines, with dj ∈ D is given by 2.4.1.

 z−|ti−dj |  z if |ti − dj| < z TUFdj (ti) = (2.4.1) 0 else

In 2.4.1 ti is the response time, dj is a deadline, and z the half length of the useful section of the time line. TUFdj (ti) = 1 means that the response is fully useful, and TUFdj (ti) = 0 that it is useless for the specific dj deadline. The TUF in figure 2.9 is the periodic case of equation 2.4.1 with a 2z interval. This particular TUF is isochronal, meaning that an early

23 Figure 2.9: Time Utility Function Example. An isochronal utility function for periodic requirements.

response is as much unwanted as a delayed response. Also, this function is clearly from a soft real-time requirement, because delayed responses are unwanted, but acceptable. The TUF can be simplified for hard real-time as well, with the only difference that the function goes from R+ −→ {0, 1}. For instance, the TUF for the figure 2.8, with a DTP of x would be given by 2.4.2.

 1 if |ti − dj| < x TUFdj (ti) = (2.4.2) 0 else

2.4.4 A Formal Specification: Temporal Logics

For many years logic has been used extensively to specify and prove properties of algorithms and programs. Therefore, it is natural to use logic to specify and verify temporal require- ments, using a logic-based (or automata-based) language. This approach allows to specify and verify system properties and deduce important results, even in practice [AG93]. As reactive systems have special time constraints, a modal logic with temporal operators is needed to describe them. The original temporal logic was introduced by Pnueli [Pnu77] and has been studied extensively since then. This modal logic defines the basic quantifier 2 as “always”, and its dual 3 as “eventually”. Also, the quantifier until (U) is defined to specify

24 conditional temporal sentences, and serves to define the former operators. For instance, for the typical response property, “always when the proposition (or event) p is observed, eventually the proposition q will be observed”, the proper Propositional Temporal Logic (PTL) [GPSS80] formula is: 2(p −→ 3q) (2.4.3)

PTL is a classical example of a Linear Temporal Logic (LTL), which is the class that describes all the temporal logics that are interpreted over linear structures of system states. There is a wide variety of temporal logics nowadays, almost all of them based on Pnueli’s operators. For more information about LTL-based logics and other temporal logic classes, please refer to [AH91]. A real-time system is a special case of a reactive system on which the timing constraints are critical, so the classical LTL must be modified to support them. Equation 2.4.3 only describes a qualitative temporal requirement or property of a system. Proposition 2.4.3 expresses that q eventually will be true if p is true, but there is no information of when this will happen. PTL is insufficient for the specification of quantitative temporal requirements, so this logic must be extended to incorporate metric time values in the requirements. Koymans’s extensions [Koy87], where included onto the Metric Temporal Logic (MTL) [AH90], which incorporates timing constraints to the modal operators 2, 3 and U, by adding a time specification as a subindex. For instance, equation 2.4.4 means that q will be true exactly t time units after p is observed.

2(p −→ 3=tq) (2.4.4)

MTL is a very useful tool to describe real-time system properties, because it provides a succinct syntax to describe complex requirements. Alas, MTL is still insufficient to verify or synthesize real-time requirements. Informally, it is easy to see that checking a precise time point on the real time line (R+) is not possible, because of its dense nature. Formally, an inspection of very simple real-time properties (such as equation 2.4.4), turns out to be undecidable [AH90, Koy87]. A solution for this problem is to weaken the time line to a non-dense (discrete) set, bounded by a digital clock precision or some other measurable physical time step [OW07]. A less invasive technique is the Metric Interval Temporal Logic (MITL) [AFH91], that uses the concept of interval as a dense subset of R+. This way, the operators 2 and 3

25 can be bounded to an interval. For instance, equation 2.4.5 is requiring that p and the corresponding q propositions, are separated by more than a time units and less than b time units.

2(p −→ 3[a,b]q) (2.4.5) As the new modal operators can be defined with the constrained version of the “until” operator UI , the MITL grammar can be inductively defined through 2.4.6.

φ := p|¬φ|φ1 ∧ φ2|φ1UI φ2 (2.4.6)

Timed State Sequences

MITL formulas are interpreted over timed state sequences [AFH91], meaning that a formula can be proven to be true or false, over a specific timed state sequence. A timed state sequence is a classic state sequence bounded by time intervals. The general idea is to provide quantitative information of when a state becomes true on a system. Therefore, the first step is to formally define a time interval, to clearly introduce the timed version of a state sequence of a given execution of a system.

Definition 2.4.1. A time interval I is a convex subset of R+, which is represented by a pair (a, b) where a ≤ b for a, b ∈ R+. An interval may be open, half-open, or closed. As well, I may be bounded or unbounded. Therefore, the possible forms of a interval I are: [a, b], [a, b), [a, ∞), (a, b], (a, b), (a, ∞). The left-point a of I is denoted by l(I), and the right-point b by r(I).

• I is singular iff it is closed and l(I) = r(I).

• Two intervals I and I0 are adjacent iff r(I) = l(I0) and exactly one of these two end- points is open.

• The notations t + I and t · I, with t ∈ R+, denote the intervals {t0 + t|t0 ∈ I} and {t0 · t|t0 ∈ I}, respectively.

Systems are ruled by logic, so a system state can be represented by a set of true propo- sitions at a given instant. Let P be a set of all the possible propositions on a system. Then, a system state s is a subset of P . Let s ⊆ P , and p ∈ P . If p ∈ s, then the state models the proposition (s |= p), so the s is a p-state.

26 A system execution is a state sequence, which can be bounded to time values by associ- ating a time interval to each state in the sequence. To construct a timed state sequence, the consecutive states of the sequence must have adjacent intervals.

¯ Definition 2.4.2. Lets ¯ = s0s1 . . . sn be a state sequence where si ⊆ P , and I = I0I1 ...In ¯ is a sequence of time intervals. A timed state sequence τ = (¯s, I) is a pair, where I0 is left + closed, l(I0) = 0, and Ii,Ii+1 are adjacent. Also, every time t ∈ [0, r(In)] ⊂ R must belong

to an unique interval Ii. The function τ(t) with t ∈ Ii gives the corresponding state si.

Formally, for a MITL-formula φ and a timed state sequence τ = (¯s, I¯), the τ |= φ relation can be inductively defined as:

τ |= p iff si |= p

τ |= ¬φ iff τ 2 φ

τ |= φ1 ∧ φ2 iff τ |= φ1 and τ |= φ2 ¯ 0 ¯ 0 τ |= φ1UI φ2 iff for some t ∈ I, (¯s, I − t) |= φ2, and ∀t ∈ (0, t), (¯s, I − t ) |= φ1

For further details on this last definition please refer to [AFH91].

27 Chapter 3

Periodic Requirement Synthesis

If specifying real-time properties is hard, verifying them is twice hard. Engineers and pro- grammers do their best to equip a system with real-time properties, but commonly they do not verify if these properties comply with the requirements. Verification has always been a domain of theorem proving, automata-based constructions and simulation, because it is not possible to verify that a real-time property holds in a real world system, since real world fail- ures happens in unpredictable ways. Regardless, an empirical verification technique could help engineers to stress real world implementations, testing if the real-time properties com- ply with the requirements when the system is operating normally. An empirical verification consists in verifying the properties of a real world system, by checking the results of several runs of the system without any special knowledge of how the system is constructed. In con- trast, formal verification requires a deep knowledge of how the system works, and each part of the system must be formalized in a automata-based model or similar, which is impractical for a large system such as an operating system or a framework/middleware. The objective of this chapter is to define the values and algorithms to measure if a real system is conforming with periodic real-time requirements. This process of emerging values and algorithms from the basic idea of a requirement will be called synthesis. To produce the correct method, the first step is to define clearly what a periodic real-time requirement is, to understand the scope of the synthesis. This specification will be done using MTL and MITL syntax.

28 3.1 Types of Real-Time Requirements

Most of the real-time requirements are basic periodic requirements. A periodic requirement consists of a task that must be executed at regular intervals, usually specified by its frequency. The periodic requirement is basic, if the frequency is fixed and must always be met while the system is running. As an example, a control system, such as an embedded system that controls hardware, usually has basic periodic requirements to perform the control loop for polling sensors and correcting actuators. A non-basic periodic requirement can be a sporadic requirement, where the fixed frequency task is executed or not depending on the system state. Other possibilities are dynamic frequencies, where the task must always be executed, but the frequency depends on the system state. And last, aperiodic requirements are those with no frequency associated, such as a reactive response to an unpredictable event like an exception [Sta96].

3.1.1 Periodic Requirement Formulas

A periodic requirement is defined by a frequency at which a task should be executed. This frequency defines the precise interval between deadlines, but it is not a complete specification of a deadline. A deadline cannot be a single point in a dense subset (i.e., R+) as the frequency by itself specifies, but an interval which offers a tolerance to the punctuality (see section 2.4.4). Therefore a requirement ω will depend on a frequency f and on an early bound a and a late bound b (see figure 3.1 (a)). Then the parametric version of the requirement will be ω(∆, a, b), where ∆ = 1/f. A periodic requirement on which these three values are defined and fixed, will be considered as well-specified basic periodic requirement. Alas, including an interval as a deadline rather than an exact point adds new compli- cations. The period ∆ is the time difference between deadlines, but the deadlines can be relative to different time points. The periodic deadlines can be relative to the global physical time, so the setup of each deadline only depends on the global time. For instance, if a system starts at time t0, the deadline di equals t0 + i · ∆, with an early bound li = t0 + i · ∆ − a and a late bound ri = t0 + i · ∆ + b. The series of deadlines depends only on the start time point and the global physical time, not on the current realization of the system. This type of requirement will be called Strict Periodic Requirement (SPR), and will be denoted by the

29 symbol θ (see figure 3.1(a)).

Figure 3.1: Strict and Relative Periodic Realizations. The circles are each task execution, and the lines are the deadline specifications. (a) The interval ∆ = 1/f goes from each deadline specification to the next specification. (b) The interval goes from the last task execution to the next deadline specification. Also, the early and late bounds a and b can be seen on the figure.

The other possibility is that each new deadline depends on the instant when the previous deadline was met, so the series provides a consistent period ∆ between system task executions rather than from the initial t0 time point. This relative series is useful when the period between events is more important than the physical time point when the deadline is met.

Now, if the system starts on time t0, the deadline di equals ti−1 + ∆ where ti−1 is the time when the deadline di−1 was met (li = ti−1 + ∆ − a and ri = ti−1 + ∆ + b). This type of requirement will be called Relative Periodic Requirement (RPR) and will be denoted by the symbol φ (see figure 3.1(b)) To illustrate the difference between these two approaches, consider the tracking of the position of a star in the sky: This requirement strictly depends on the physical time, because the sky does not depend on our system’s behavior. But in the case of controlling a fridge system, the temperature depends on the time of our last action, because the temperature is affected by how long our fridge system is on, so the next deadline is relative to the last execution of an action.

These ideas can be formally specified by logic formulas: Let the proposition p0 mean “system start”, q be “physical time deadline” and p be “task execution”, then the general

30 formulas for SPR and RPR are given by equations 3.1.1 and 3.1.2.

θ0(∆, a, b) := 2((p0 −→ 3=(∆−a) q)∧ (3.1.1) (q −→ 3=∆ q) ∧ (q −→ 3[0,a+b] p))

φ0(∆, a, b) := 2((p0 −→ 3[∆−a,∆+b] p) (3.1.2) ∧(p −→ 3[∆−a,∆+b] p)) Formula 3.1.1 should be read as follows: “Always that the system starts, a physical time deadline will be observed exactly ∆ − a time units after. Also, always that a physical time deadline is observed, another physical time deadline will be observed exactly ∆ time units after, and a task execution will be observed between 0 and a + b time units after”. The formula 3.1.2 follows the same idea. Note that φ does not use the q proposition, because it does not depend directly on the physical time deadlines, but only on the previous task execution and the system start. Formulas 3.1.1 and 3.1.2 are not well-formed MITL formulas, because not all the operators are bounded (i.e., 2), and expressions with the form 3=t are not allowed [AH91]. To ensure the decidability of the requirements, the formulas must be modified to be well-formed MITL formulas. The 2 operator can be easily bound to the time domain R+ by including the subscript

2[0,inf). But real systems do not execute forever, so it can be bounded by a right closed interval 2[t0,tM ], where t0 is the time when the system starts and tM is when it halts. This states that the rules are valid only while the system is running.

To avoid the 3=t expressions from the formula 3.1.1, the explicit physical deadlines should be replaced by an implicit form that depends only on measurable propositions like p0 and p.

A physical deadline depends on the series of deadlines that start with p0, so each deadline

qi could be expressed by equation 3.1.3

2(p0 −→ 3=i·∆−a qi) (3.1.3)

Also, as each qi produces a pi event, the q propositions can be removed, as shown in the well-formed MITL formula 3.1.4.

θ(∆, a, b) := 2[t0,tM ](p0 −→ 3[i·∆−a,i·∆+b] pi) (3.1.4) with p1, p2 . . . pn the finite series of task executions within the interval [t0, tM ].

31 Following the same idea, a relative requirement depends on the previous task execution, so the well-formed MITL formula has the form 3.1.5.

φ(∆, a, b) := 2[t0,tM ](pi−1 −→ 3[∆−a,∆+b] pi) (3.1.5)

3.2 Verification Values

3.2.1 Hard Real-Time Verification: The Jitter Value

A real-time system behavior, for an specific run, can be represented by a timed state se- quence, which will be called a realization of the system. A realization is the timed mea- surement of the system states for an specific run, so the synthesis objective is to check if a realization complies with the requirements specified in section 3.1.1. The following lemmas are easily deduced from formulas 2.4.7, 3.1.4 and 3.1.5, which serve to decide if a timed state sequence τ |= θ (or τ |= φ).

Lemma 3.2.1. A realization τ = (¯s, I¯) models the requirement θ(∆, a, b), if for all i there ∗ exist a τ = (s, I) ∈ τ such that s |= pi, l(I) ∈ [t0 + i · ∆ − a, t0 + i · ∆ + b], and then exists a τ0 = (s0,I0), such that s0 |= p0, l(I0) = t0.

Lemma 3.2.2. A realization τ = (¯s, I¯) models the requirement φ(∆, a, b), if for all i there ∗ 0 0 0 0 0 exist a τ = (s, I) and a τ = (s ,I ) ∈ τ such that s |= pi, s |= pi−1, l(I) ∈ [l(I ) + ∆ − 0 a, l(I ) + ∆ + b], and there exists a τ0 = (s0,I0), such that s0 |= p0, l(I0) = t0.

Therefore, for each deadline the checking functions 3.2.1 and 3.2.2 can be defined.

Cθ(t, i) = t ∈ [t0 + i · ∆ − a, t0 + i · ∆ + b] (3.2.1)

Cφ(t, tp) = t ∈ [tp + ∆ − a, tp + ∆ + b] (3.2.2)

In 3.2.1 and 3.2.2 i is the i-th deadline in the requirement specification starting at p0, tp the left bound of the previous deadline, and t the tested time. These functions check if a time point is inside the interval that the requirements specify. These functions only check if a specific time point is in a given deadline, specified by the parameter i or tp (and also the implicit parameters t0, ∆, a and b). To provide a complete

32 Algorithm 1 Verify θ requirement: Vθ(τ, ∆, a, b) i ⇐ 1 for j ⇐ 0 to |τ| do

(sj,Ij) ⇐ τj

if sj |= pi then

if Cθ(l(Ij), i) then i ⇐ i + 1 else return false end if end if end for return true

Algorithm 2 Verify φ requirement: Vφ(τ, ∆, a, b) i ⇐ 1

tp ⇐ t0 for j ⇐ 0 to |τ| do

(sj,Ij) ⇐ τj

if sj |= pi then

if Cφ(l(Ij), tp) then

tp = l(Ij) i ⇐ i + 1 else return false end if end if end for return true

33 verification of the whole realization, all the pi propositions should be observed at the right times, as detailed in algorithms 1 and 2. These functions can be used as a hard real-time test to check if a realization met the θ or φ requirement, for a given ∆. Even more, comparable quantitative values can be obtained

from functions Vθ and Vφ, by seeking the minimum a and b values for which the verification holds.

Definition 3.2.1. Let ∆ be a period, τ a timed state sequence, Vx a verification function

(i.e., Vθ or Vφ) and A and B the set of all the possible values of a and b respectively, the

jitter Jx(∆, τ) is the max(amin, bmin), where amin and bmin are the minimum a ∈ A and b ∈ B for which V (τ, ∆, a, b) holds.

In other words, the Jx(∆, τ) is the maximum difference between the perfect realization of the requirement of type x, and the current τ realization.

There are two ways of computing Jx(∆, τ). The first one is by running the Vx functions several times incrementing the a and b values using a pre-defined step, from 0 until Vx holds.

The other approach is to setup a perfect realization model of deadlines, such di = t0 + ∆ · i for the θ requirement, and select the maximum difference between the time state sequence and the perfect model. This idea is presented in algorithms 3 and 4.

Algorithm 3 Jitter for θ requirement: Jθ(τ, ∆) i ⇐ 1 max ⇐ 0 for j ⇐ 0 to |τ| do

(sj,Ij) ⇐ τj

if sj |= pi then

if max < |t0 + ∆ · i − l(Ij)| then

max ⇐ |t0 + ∆ · i − l(Ij)| i ⇐ i + 1 end if end if end for return max

34 Algorithm 4 Jitter for φ requirement: Jφ(τ, ∆) i ⇐ 1

tp ⇐ t0 max ⇐ 0 for j ⇐ 0 to |τ| do

(sj,Ij) ⇐ τj

if sj |= pi then

if max < |tp + ∆ − l(Ij)| then

max ⇐ |tp + ∆ − l(Ij)| i ⇐ i + 1

tp ⇐ l(Ij) end if end if end for return max

3.2.2 Soft Real-Time Verification: The MSE value

In the case of soft real-time verification, the Quality of Service (QoS) concept arises as a measure of the compliance of the requirement, rather than a boolean (yes/no) function such

as Vθ or Vφ. The QoS can be understood as a distance from the perfect realization of the requirements in some domain. It is almost natural to think on a TUF mean, as the average of the utility of a timed state sequence, but TUF has two major problems. First, the TUF function specification requires an in-depth knowledge of the requirements and system, which often is too hard to obtain and apply. Even worse, the metric is driven by this arbitrary TUF function that transforms the time domain to an utility domain that is not formally specified, so the values can be badly tainted by the TUF selection. Regardless, TUF functions are a good practical approach to compare the QoS of heterogeneous soft real-time systems. In any case, the formal definition of the utility domain should be refined using new logic techniques. On the other hand, the Mean Squared Error (MSE) is a well known metric (usually used as variance) which has been widely studied. The metric is the average of the squared errors

35 of a sample, which are the differences between the sample values and a model or distribution. Let the differences between deadlines and the realization a random variable X, the MSE(X) (or loosely MSE(∆, τ)) can be calculated from a probabilistic model. This distribution must model the perfect realization on which each deadline i is exactly met, so the Dirac density functional δ(x) can be used, since it concentrates all the values on a single point x = 0 (deadline), as equation 3.2.3 shows. δ(x) = 0 if x 6= 0 (3.2.3) R ∞ −∞ δ(x)dx = 1 Formally, X will be the set of the differences between the specification and the realization, where each xi = di − l(Ii), where (si,Ii) ∈ τ and di is the deadline specification (i.e., ∆ · i for θ and l(Ii) + ∆ for φ). Then the MSE(X) (or MSE(∆, τ)) will just be the average of the squared values of X. X x2 (3.2.4) |X| x∈X

3.3 Clock Drifts

Measuring time is always hard: Clocks are not perfect at all, and even the most advanced nuclear device has a small drift from the physical time1. Real-time requirements are often specified using physical time restrictions (such as θ and φ), but a clock is needed to measure the system behavior and verify these restrictions. This clock, that usually is the system clock, can be compared only with other clocks, so the best effort that can be done is to measure the clock drift from a reference clock, and transform the values to the reference time domain. The idea is to transform a timed state sequence from any clock measurement to the reference clock domain. Obviously, the reference clock should be the best clock available (i.e., the most accurate and precise), such as a real-time clock device, or a NTP server of high stratum. In the particular case that the reference clock is the same than the system clock, or if the timed state sequence can be obtained directly from the reference clock, there is no need for a time domain transformation. But generally, the querying frequency of the reference clock is restricted, so a clock drift function has to be deduced to interpolate the rest of the time points between the two clocks. 1Consider physical time as Newton’s idea of an absolute time that flows equably without relation of anything external

36 Following a similar idea of Huang’s synthesis [HVC07], the clock drift function is defined.

Definition 3.3.1. Assuming there exists a true physical time, a clock clk is a function that maps each point t in the true physical time domain T to another time domain Tclk, with the form: Z t clk(t) = f(t0)dt0 (3.3.1) 0 where f(t0) is the change rate of clk at true physical time t0.

Definition 3.3.2. A clock drift function between two clocks clk1 and clk2, with change rate functions f1(t) and f2(t), is defined as:

clk2(t + δ) − clk2(t) D(clk1,clk2)(t) = lim (3.3.2) δ→0− clk1(t + δ) − clk1(t) or by the equivalent equation: f2(t) D(clk1,clk2)(t) = (3.3.3) f1(t) 2

For example, the clock drift between a clock clkn(t) = αt + β and the perfect physical

clock clkp(t) = t, is: d(αt+β) α D (t) = dt = = α (3.3.4) (clkn,clkp) d(t) 1 dt

This value means that the clkn goes α times faster than the perfect physical time clock clkp, so it can be used to correct the clkn measures to have the results in the physical time domain. To calculate the drift function of a real clock in terms of a reference clock, the regression technique can be used by obtaining the time differences between the clocks. Assuming that our reference clock is perfect (clkr = clkp), then a clkn(t) function can be obtained by a regression analysis, and easily obtain the clock drift by:

d(clk (t)) D (t) = r (3.3.5) (clkn,clkr) dt The decision of using linear or non-linear regression will depend on the general clock behavior, but a change rate function that depends on time (i.e., is non-linear), is very difficult to synchronize and apply. Clocks usually have constant change rate functions for longer intervals, so a linear regression can be applied assuming that the clock drift is linear; if

37 not, the experiment is probably heavily tainted by the clock drift, so redoing the experiment with a more consistent clock is a better approach that trying to correct for it. Therefore, the decision of redoing the experiment will be driven by the correlation statistic of the linear regression.

A timed state sequence τ in the time domain Tclkn can be corrected to a time domain Tclkr ¯ by modifying the left and right limits of each interval I ∈ I using D(clkn,clkr)(t). Therefore, 0 the notation τ = Dn→r(τ) will be used as a function that transforms the τ timed state 0 sequence from the Tclkn domain, to a τ in Tclkr domain.

38 Chapter 4

Empirical Verification Model

A requirement can be formally verified exploring the system constitution by model checking or an automata based description, but for larger systems this approach is not practical. This chapter presents a consistent method to apply an empirical verification of the requirements, using only synthetic tests over the system behavior, no matter what the system constitution is. The proposed empirical verification model uses the values and algorithms introduced in chapter 3, to model and verify periodic real-time properties of a target system for hard and soft real-time. Also, the model includes an external verification extension and a concurrency task extension to give a more realistic perspective of the behavior of the system. The model is generic, so it can be applied to several systems regardless of the complexity of the code. To apply the model, the target system should be equipped with following characteristics:

• Real-time periodic task support, either strict or relative

• A precise current time function that can be called at each deadline

• Concurrency of several real-time tasks

• A reference clock, such as a real-time clock or a NTP clock over a fast network (i.e. Gigabit)

• A data port to make external measurements

Please note that the three last characteristics are requirements for the model extensions, so the basic conditions to apply the model are very easy to fulfill.

39 4.1 Test Requirement Specification

The model’s inputs are a parametric basic periodic requirement ω and a set R of realizations. To specify the requirement, the first step is deciding if ω is θ-type or φ-type. Deciding if the requirement is strict or relative is not an easy task, and a wrong decision will produce a wrong verification procedure. The key question here is if the requirement is affected by physical time constraints, such the (relative) movement of the sky, a pendulum period or the speed of light. Then the requirement parameters ∆, a and b must be specified. For instance, the isochronal periodic requirement of figure 2.9 could be interpreted by the values ∆ = 2z and a = b = z (ω = θ(2z, z, z)). The selection of the maximum early jitter a, and the maximum late jitter b could be delayed if there is no specific information, but these values must be specified before extracting any conclusion of the measurements.

4.1.1 Obtaining a Timed State Sequence

To verify a system property, the behavior of the system can be analyzed by obtaining the timed state sequence of the system realizations. This must be done by executing the system normally, but adding a simple routine that registers the timestamp each time the task runs. To avoid I/O latencies, this routine should register the timestamp in memory, and dump all the results to a file when the test run ends. These timestamps T = {t1, t2, . . . , tn} can be read ¯ as the interval set I, where Ii = [ti, ti+1). The states should also be registered to know which deadline (or state) the timestamp should be linked to. If the test will be only for an isolated property, registration of states is not necessary. In summary, each run (i.e., realization) of the system will produce a timed state sequence τ, which will be in the time-domain of the host machine.

4.1.2 Repetitions

An empirical verification is based on several repetitions of the same experiment, so in this case an important value to define is how many test runs to do. This value can be a constant given by the engineer’s experience, or a dynamic value in terms of the partial results. We suggest no less than 10 realizations and no upper limit if a constant value is selected, but encourage the utilization of a dynamic approach. A simple dynamic method is to observe the

40 standard deviation of the Jitter and MSE values. If the standard deviation does not change very much, then more realizations will not affect the results. If not, new realizations must be done till this value remains near to a constant value. The results of these repetitions will be a set R = {τ (1), τ (2), . . . , τ (k)}, where τ (i) is the i-th realization (timed state sequence) of the system, and k the number of realizations discussed

4.2 Model Usage

The model must be understood as a function that receives a requirement ω and a set of realizations R, as formula 4.2.1 shows.

EVM(ω, R) = [Bhard,Bsoft] (4.2.1)

In 4.2.1, Bhard and Bsoft are boolean values that hold if the requirement complies with hard and soft real-time, respectively. These boolean values are obtained by formulas 4.2.2 and 4.2.3, respectively.

 true if max(Jω(∆, τ)) < max(a, b) τ∈R Bhard = (4.2.2) false else

 k p (i) X MSEω(∆, τ ) true if < max(a, b)  k Bsoft = i=1 (4.2.3)  false else

The model requires following steps to be applied:

1. Specify the requirement

2. Run the system several times and obtain the timed state sequences

3. Calculate the maximum Jitter and the mean MSE of the realizations

4. Compare the (a, b) values with the maximum Jitter to decide if it complies with hard real-time

41 5. Compare the (a, b) values with the mean MSE to decide if it complies with soft real- time

We obtain two simple boolean values (hard and soft verification) that say if the system complies with the specified real-time requirement. This model uses the maximum Jitter and the MSE values for deciding if a realization complies, but a TUF function could be used to produce similar results. The problem is that it is very difficult to provide generic TUF functions to comply with several requirements.

4.2.1 External Verification Extension

In the model EVM(ω, R), each τ ∈ R is obtained using the system clock. The problem here is that the system property is driven by the same clock, so system latencies could be hidden to the model. The time domain transformation (section 3.3) helps to correct the values using a reference clock function, but unfortunately this correction is not enough to provide an external perspective of the system. The solution is to setup a real external instrument (i.e., another computer or an oscilloscope) to register the deadline timestamps through a data port, like the parallel or serial port. To do this, the timestamp routine must be modified to send a signal on the data port, besides registering the timestamp. Therefore, there are three time domains: The target system domain (Sys), the external system domain (Ext), and the reference domain (Ref). Any given timed state sequence τ can be transformed from the original domain to the reference domain using the measured clock drift functions, and the DSys→Ref(τ) transformation. Also an indirect transformation can be done as shown in equation 4.2.4.

DExt→Ref(τ) DExt→Sys(τ) = (4.2.4) DSys→Ref(τ)

The idea is to transform a set R = {τ (1), τ (2), . . . , τ (k)} of timed state sequences on 0 (1) (2) (k) the external time domain, to a set R = {DExt→Sys(τ ),DExt→Sys(τ ),...,DExt→Sys(τ )} on the target system time domain. This is an extension, because there is not always the possibility of having the external or reference clock.

42 4.2.2 Concurrent Tasks Extension

Many real-time systems should support concurrent real time requirements, so the model could be extended to add controlled concurrency tests. Therefore, harmonic and non- harmonic Hartstone-like [WDS90] series could be used to test concurrent real-time tasks. Hartstone proposes five test series of increasing complexity, which are modifications of the PH or PN series. The PH series stands for Periodic Tasks, Harmonic Frequencies, whose objective is to provide test requirements with a set of concurrent tasks that are periodic and harmonic. This means that each task is executed at regular intervals, and the frequency of each task is an integer multiple of all tasks with lower frequencies. The PN series, are periodic tasks, but with non-harmonic frequencies, such a physical phenomena frequencies, or generated frequencies from a complex mathematical formula. Hartstone provides some values for the PH series frequencies and loads (Whetstones), but they are only examples of the usage of Hartstone, and hardly applicable to newer and faster systems. Following [DSW90], base-2 harmonic frequencies could be used for concurrency tests. For example, if a periodic property of f = 100 [Hz] will be verified, several harmonic tasks could be used as load, with frequencies like 50 [Hz], 25 [Hz], and 12.5 [Hz]. Generally, the i i-th harmonic frequency will be f/2 . For non-harmonic series, the Fibonacci series Fi could be used, because it produces predictable non-harmonic phenomena and successive terms of

the series are relatively prime, where the i-th non-harmonic frequency is f/Fi.

4.3 An Example

This section presents the configuration and conditions under which the described model was applied to some system calls of the real-time POSIX API over two real-time operating systems: Linux RT PREEMPT and QNX Neutrino.

4.3.1 Hardware Configuration

All the experiments were made using desktop PCs class machines, with x86-processor based architecture. By default all the unnecessary hardwares were disabled to avoid extra inter- rupts. The specific hardware configuration is: Intel Celeron D 2.8 GHz (Prescott core), Intel

43 915 chipset based motherboard, 512 MiB DDR2 RAM in single channel configuration. Sound card, USB ports, parallel port and ACPI events were disabled at the BIOS configuration.

4.3.2 A Real-Time POSIX Periodic Task

For this example, real-time properties were simulated using a stand alone application that executes a task periodically. To test the same property in both operating systems, the application was developed using the POSIX real-time specification. Unfortunately, there is no periodic timer in POSIX real-time (there is a non real-time timer in POSIX called itimer), so both θ and φ requirements were written on a test-bench suite, using POSIX real-time calls. The test-bench consists of two POSIX real-time programs called the writer and the reader, and several scripts for running the tests, collecting the data and calculating the results. The writer is the “event producer” program, it sends data with a fixed frequency via the serial port (RS-232) using non-blocking writing. This program can be configured to use a θ-like property or a φ-like property, registering all the needed timestamps to build the local timed state sequence. The minimum data that can be sent via serial port is 1 [byte], so the writer sends a character through the serial port with a top speed 115200 [bps] (14400 bytes per second). The reader consists in an “event consumer” task that receives the characters through the serial port with blocking reading, registering the arrival timestamps to build the external timed state sequence. Some implementation details of the program are:

• All the times are measured in nanoseconds.

• The time is obtained by the clock gettime system call, and the interval sleeping is done by clock nanosleep, offering nanosecond resolution, and real time precision according to [IEE94].

• A FIFO CPU scheduler was selected, using the sched setscheduler system call. Ad- ditionally the processes has been setup with a higher priority than normal processes, but lower than the IRQ handlers.

• The memory in which the program is loaded is locked to avoid paging interrupts, by

44 the system call mlockall (this has effect only on Linux RT PREEMPT, in the QNX case this call has no effect because of the QNX design).

4.3.3 External Verification

Figure 4.1: The Physical Test Machine Deployment

The event consumer program (i.e., reader), was deployed on another identical desktop class PC, connected to the producer (i.e., writer) PC via serial port RS232. The physical configuration is shown in figure 4.3.3. Both programs register the timestamps in memory, and then dump the data to a local file. These files will be the timed state sequences that will be verified. Also, for each realization the time differences from a NTP external server were obtained with a frequency near 1 [Hz]. These values will help to correct the reader’s timed state sequences to the writer’s domain.

4.3.4 Requirement Frequencies and Repetitions

The parametric requirements θ and φ depend on the frequency on which the controlled system will work. The idea of this example is to do a general survey of different frequencies.

45 Frequency Iterations Repetitions [Hz] 1 1000 10 10 1000 30 100 1000 50 1000 10000 100 10000 100000 100

Table 4.1: Number of repetitions for each test run

As it is impossible to test all the frequency possibilities, only one member of each order has been considered. The selected frequencies are 1 [Hz], 10 [Hz], 100 [Hz], 1 [kHz], and 10 [kHz]. Lower frequencies such as 0.1 [Hz], are very uncommon in real-time systems, because a task executed on a 10 seconds period usually does not need real-time support. Higher frequencies such as 100 [kHz] were not tested due to the fact that the serial port and other hardware used for the experiment do not support such frequencies. Each realization was executed at least for a thousand iterations, meaning that for each test at least a thousand deadlines were tested. In the case of higher frequencies (i.e., 1 [kHz] and 10 [kHz]), a thousand iterations give less than a 1 second execution time, which is too short to obtain the clock drift. For this reason, the higher frequencies were tested with 10 000 and 100 000 iterations, respectively. Also, it is not enough verification to run each realization once, so each frequency test was run several times. Depending on the frequency, the repetitions for each experiment vary, because the standard deviation changes for different ∆ values. Table 4.1 summarizes all the values given in this section.

4.3.5 Results

The following results are only an example of how to use the proposed model, they are not benchmarking results about RTOS or hardware. Please do not use these values as a validation for the selection of hardware or operating system.

46 √ θ Jitter MSE Frequency Linux QNX Linux QNX [Hz] [µs] [µs] [µs] [µs] 1 46 65 31.6 43.5 10 45 64 18.3 39.9 100 44 63 12.8 23.3 1000 45 64 16.4 16.5 10000 37 98 15.0 14.7

√ Table 4.2: Maximal Jitter and average MSE values calculated for both operating systems for θ property verification

Local Clock Domain

The model was applied to the proposed example, obtaining the timed state sequences for the local clock domain for several frequencies, over two different real-time operating systems and for the strict and relative requirement types. Using the strict periodic task, the maximum √ jitter and mean MSE values for the θ requirement are shown in table 4.2. The same values, but for the φ requirement, are shown in table 4.3 These values confirm that Linux and QNX support very well strict periodic requirements up to 100 [Hz], almost well up to 1 [kHz], and not well from there. It seems that Linux has better performance for hard and soft real-time than QNX, for θ-like requirements. The MSE values can be understood better through their square roots (table 4.2 and table 4.3), showing that the errors are of order of 10 [µs], meaning that most of the jitter values are near the maximum jitter, so the behavior is quite similar for all the iterations. In the case of φ-like requirements, table 4.3 shows that both operating systems have very similar values. Please note that these values should be contrasted to a real requirement to produce relevant results. For instance, if a controlling system has a θ requirement of 20 [Hz] with a maximum jitter (or error) of 1%, both Linux and QNX comply. But for a φ requirement of 2 [kHz] with a maximum error of 10%, neither Linux nor QNX complies.

47 √ φ Jitter MSE Frequency Linux QNX Linux QNX [Hz] [µs] [µs] [µs] [µs] 1 58 59 31.5 42.0 10 52 58 18.5 43.5 100 58 58 13.6 21.6 1000 57 60 12.9 16.7 10000 44 11 12.4 11.0

√ Table 4.3: Maximal Jitter and average MSE values calculated for both operating systems for φ property verification

Reference Clock Domain

The local clock domain results are self-referential, because the same system clock is generat- ing the events and registering the results. A simple (but not perfect) way to overcome this problem is to use a reference clock. In this example, the NTP differences are used to obtain

a clock drift, and then apply this function to each timed state sequence (i.e., DSys→Ref(τ)). In this case, not all measurements are valid because if the clock drift has non-linear behavior, the correction turns too hard to synchronize and to apply. For that reason, the 5 repetitions with the highest correlations (≥ 0.99) will be considered, just as an example of how the system behaves in terms of the reference time domain. Unfortunately, the clock of the PC system is not good enough for real-time, because the measured slopes are too large to offer a strict periodic property, so in terms of the reference domain, the errors are cumulative. Just as an example of this phenomenon, the table 4.4 shows the strict periodic task corrected to the reference time domain on the Linux RT PREEMPT. The worst slope is presented to check how this issue affect to the jitter and MSE values. Please note that low frequencies have longer execution time, so the cumulative error is much more than at higher frequencies. This means that the system clock is not suitable for strict periodic real-time requirements, because the clock drifts generate too large errors in long term executions.

48 √ Frequency Linux Jitter Linux MSE Worst Slope [Hz] [µs] [µs] 1 33249 15800 0.0000333 10 48045 1680 0.0004812 100 5386 1850 0.0005363 1000 309 171 0.0000279 10000 62 29.7 0.0000279

Table 4.4: Jitter and MSE values corrected to the reference domain using linear regression for θ property.

External Time Domain

Usually, real-time systems are meant to control specific hardware, which is not ruled by the same conditions than the controlling computer. The local clock domain results has the fundamental problem that the same computer is generating the tasks and measuring the time. A more realistic setup is the one presented in section 4.3.3, with an external computer with a real-time operating system that saves a timestamp every time it receives a character from the serial port. The problem of this approach is the clock drifts, but if the slopes of both systems are measured from a reference domain, the timed state sequence of the

external reader can be transformed to the domain of the producer system with DExt→Sys(τ) as described in section 4.2.1. Figure 4.2 is an example of this transformation. The spikes on the figures are the worst cases where jitters are much higher than the rest of the realization. These spikes are caused by the system latencies, context switching, I/O delays, etc. The Max Jitter value will be the worst of these spikes, meanwhile the MSE value is the measure √ of the whole difference from the x = 0 line. The corrected reader maximum jitter and MSE values for θ are in table 4.5, and the same values for φ are in table 4.6. For the strict periodic requirement, the maximum jitter that an external device perceives using Linux is between 0.5 and 5 milliseconds, which is too much for higher frequencies. Even more, QNX shows jitters near 70 milliseconds, which is too much even for lower frequencies. As the operating systems offer much better performance, like in the case in table 4.2, the problem might lie with the hardware (which is very likely because the test setup is not

49 (a) Jitter behaviour at writer according to itself

(b) Jitter behavior at writer according to reader

(c) Jitter behavior at writer according to reader and corrected by DExt→Sys(τ) function described in 4.2.1

Figure 4.2: Jitter behavior (figures 4.2(a), 4.2(b)) and transformation (figure 4.2(c))

50 √ θ Jitter MSE Frequency Linux QNX Linux QNX [Hz] [µ s] [µ s] [µs] [µs] 1 2542 61875 1801 20800 10 1218 75252 876 56000 100 1469 73696 353 46700 1000 5985 79544 5460 53600 10000 853 15928 41.7 9360

Table 4.5: Corrected reader’s maximal jitter and MSE values by DExt→Sys(τ) transformation for θ requirement real-time equipment). The MSE values also show that the high jitters are not an isolated phenomenon, but also show that there is no direct relation between the execution time and the quality of service. Therefore, the high jitters were not produced only by the clock drifts, but because of the latencies of other devices. The differences between Linux and QNX might be produced by the implementations or other hardware manipulation primitives, but a detailed code analysis would be needed to verify this. No matter which the detailed reasons of all these results are, the most important conclusion is that the proposed system is not suitable for controlling a device with a strict periodic requirement of high −2 −4 −4 0 frequency. For instance, for the requirement θA(1 · 10 , 1 · 10 , 1 · 10 ) and the RE =

DExt→Sys(RE), where RE is the set of realizations on the external device, the application 0 of the model EVM(θA,RE) produces the results [false, false] for both real-time operating systems, which means that the system is not capable of offering either hard or soft real-time performance. In the case of a relative periodic requirement, the results are much better. Indeed QNX shows a very good predictable maximum jitter over different frequencies, with a very con- stant quality of service. These values are still affected by hardware deficiencies, but the performance is better than the strict requirement. This is because each jitter is indepen- dent, so it does not affect the rest of the sequence. The results reinforces the basic idea that a system that is unsuitable for some requirement could be enough for others. Here, for the

51 √ φ Jitter MSE Frequency Linux QNX Linux QNX [Hz] [µs] [µs] [µs] [µs] 1 253 184 138 55.6 10 201 184 212 55.4 100 423 184 101 54.5 1000 998 184 855 54.3 10000 1096 27 1490 14.1

Table 4.6: Corrected reader’s maximal jitter and MSE values by DExt→Sys(τ) transformation for φ requirement

−2 −4 −4 0 requirement φA(1 · 10 , 1 · 10 , 1 · 10 ) and the RE = DExt→Sys(RE), the application of 0 the model EVM(φA,RE) produce the results [false, true] for QNX, which means that QNX could support a soft real-time requirement of 100 [Hz] with a maximum error of 1%. Even more, for the same frequency, a hard real-time performance can be obtained only with 2% of error.

52 Chapter 5

ACS Time System Verification

The ACS Time System was built using the time functions that the ACE library provides. ACE claims to be real-time, so there is a chance that the ACS framework behaves also in real-time if a POSIX real-time operating system is setup beneath. Unfortunately, the ACS Time System was built using the Component/Container model, so all the calls pass through CORBA. The TAO middleware also claims to be real-time, but unlike ACE, the TAO real- time support must be explicitly used through a different API. The ACS Container was built for multilanguage support, so the C++ Container based on TAO complies with a general design for non real-time applications so as to be compatible with the other Containers (Java and Python) and the Manager. This chapter verifies the real-time ALMA Control requirements, using the empirical model of chapter 4. This verification will be done by setting up a POSIX real-time op- erating system on which ACE can use the real-time capabilities and run the tests.

5.1 Real-Time Operating System Support

The base layer of ACS is composed by several open source packages, such as the GNU Tool Chain, the Java Environment, the Python libraries, the ACE/TAO ORB, the JacORB, the OmniORB, the loki library, etc. Supporting all these packages on a real-time operating system is a quite complex task. Therefore, a functional subset of ACS was defined to be supported over a real-time operating system.

53 5.1.1 The ACS C++ Runtime Environment

ACS can be easily divided into a runtime and a development environment, due to the struc- ture of the ACS installation. Splitting the runtime and development environment helps to provide a suitable framework for embedded systems: The target system has only the minimal subset of files to run the applications, while the host system has all the files and tools to build, debug and deploy them. The general runtime environment of ACS is composed of the Manager, the Containers and the libraries to create clients. The Manager service is written in Java, and there is a Container program for each supported language. There are also shared libraries to support C++ stand-alone clients, JAR files to support Java clients, and Python modules to use Python scripts as clients. As ACE/TAO is the only middleware with proven real-time performance, the support of the Java and Python containers for real-time applications has to be delayed as long as the other ORBs do not support the RT-CORBA specification; due to the same problem, the JAR files, the Python modules and the Manager are automatically banned from the real-time domain. Thus, the largest real-time runtime environment subset that can be achieved currently is composed of the C++ Container (called maciContainer), the ACS C++ libraries, and the shared libraries that the ACS libraries use.

Figure 5.1: ACS C++ Real-Time Environment Block Diagram

As part of the Repackaging ACS for Embedded Systems (RAES) [Ava07] project, an ACS

54 C++ Runtime Environment was created using the libraries needed by the maciContainer. These libraries can be divided into base libraries and ACS libraries, as figure 5.1 shows. To support a functional ACS over a real-time operating system, these are the libraries that the system must handle.

5.1.2 Real-Time Operating System Candidates

Besides the system libraries (such as LibC, libStd++, libMath), the support of a real-time operating system will depend directly on the support of the base libraries of figure 5.1. This section summarizes the efforts on supporting these base libraries (and then the ACS libraries) on the three real-time operating systems described in section 2.1.5.

Linux RT PREEMPT

As ACS is officially supported on Linux (RHEL 4 and SL 4), the most painless port was to Linux RT PREEMPT kernel version. Even so, the port was not straight-forward, because the RT PREEMPT patches force a kernel version update. A new kernel forces also to update the base libraries to newer versions, on which ACS was not supported. The first challenge of this port was to configure the newer packages with the updated base libraries. The next challenge was to patch the ACS libraries to compile successfully with the newer compiler and with the new libraries. Finally, the proper automated testing of the maciContainer was driven to validate the port. The result of this process is the ACS-UTFSM Distribution, which is supported on several modern distributions such as Debian, Fedora, Ubuntu and Arch [Ava07]. An ACS C++ Runtime Distribution can be obtained directly from this distribution, by copying only the minimal subset described in section 5.1.1.

QNX

This thesis uses QNX in chapter 4 to stress the POSIX implementation without too many problems. Unfortunately, the ACS port to QNX was unsuccessful. The GNU Toolchain and ACE/TAO work without problems over this platform, and as QNX is POSIX compliant, the ACS code should compile over this platform for C++. The other base libraries such as loki present some problems, but after a few modifications the whole base runtime environment works on QNX. Alas, ACS needs many other tools

55 to construct the binaries, such as the IDL generators from XML files and other helper applications. The main problem here is that the J2SDK tools are not available for QNX, so most of the build system does not work properly. Therefore, a cross-compiling technique was explored, where a host Linux machine does the compilation for QNX binaries using a similar approach to the supported VxWorks compilation. The problem here is that the build system of ACS refuses to use a cross-compiled TAO, so the shared library construction failed. At last, manual compilation was tried, but the extremely complex dependencies of ACS make this impracticable. The definitive solution to this problem is to untangle and rewrite the ACS build system as a flexible and portable infrastructure, currently still work in its early stages started through an undergraduate thesis by Marcelo Z´u˜nigaA summary of the current status of this work can be found in [Ava07].

VxWorks

This thesis planned to use a VxWorks platform as a commercial alternative of a real-time operating system. The ALMA project kindly lent a VxWorks machine, and the author was invited to visit ESO headquarters in Germany to develop a port of ACS to VxWorks 6.2. The initial status was that ACS and ACE/TAO did not not compile on the new VxWorks version, so the first porting task was to patch ACS and ACE/TAO code to successfully com- pile them. The next stage was to test ACS on this platform, but the automated tests failed over and over again for still unknown reasons. The problem was successfully isolated, but the bugs are deep inside the ACE/TAO libraries. Unfortunately, VxWorks is not supported by the official ACE/TAO developers, so there was no direct help for fixing this problem. As a port of ACE/TAO was completely out of the scope of this thesis, the ACS VxWorks port was delayed waiting for a proper ACE/TAO port. For this reason, this platform was not used in the current thesis. Currently, Remedy IT Company is interested on doing this job if ESO accepts sponsoring this complex porting.

5.2 Simulating ALMA Requirements

The described ALMA requirements for the control of the CAN Bus (section 2.3) are clearly relative periodic requirements, because the time of each time event depends only in the last

56 time event (TE), and not directly on the physical time. But as the antenna movements depend on the sidereal time, this thesis will test both types of requirements. There are three significant values in the requirement specification: The global tick of ∆ = 48 [ms], the period for changing from monitor requests to commands and viceversa of ∆ = 24 [ms], and the period resolution of ∆ = 4 [ms] to successfully provide a silent frame between global ticks. The idea is to test ACS with these values using both types of require- ments. The specification of the requirement parameters will be delayed up to the end of the

model application. A requirement ω(∆, a, b) will be denoted by ω∆·1000000. For example, a

strict periodic global tick requirement of 48 [ms] will be denoted θ48. These synthetic require- ments represent the usage of the ACS Time System as the global tick manager (48), as the phase transition between command and monitor signals (24), and as the queue dispatcher (4). The following physical layout description and the software experimental setup are not arbitrary constructions. The physical deployment is strongly based on the ALMA Monitor and Control Bus, and its design was built in coordination with ALMA CONTROL software engineers. By informal communications they have agreed that the following physical layout is a good starting point for they long-term purposes: port the ABM’s RTAI code to RT- POSIX user-space calls, using a RT PREEMPT Linux kernel beneath. In the other hand, the software experimental setup is strictly based on the Time System user manual, using a realistic deployment of components, manager and clients, with no objections by the ALMA software engineers.

5.2.1 Requirement Simulation

As we have no access to a CAN Bus nor to the ALMA devices, a simulated environment will be used (see figure 5.2). This environment consists of two computers connected by a serial port, which represents the data bus. Also, a coordinated tick was setup, using the parallel port interrupt pin, which represents the coordination tick. This coordination tick is managed by a custom kernel module that updates the system clock consistently with the tick. The idea is to provide the same time domain to both computers.

57 Figure 5.2: Physical Simulation Diagram.

5.3 Experimental Setup

The machines used in this section, and all the hardware and software setups are the ones described in section 4.3.1.

5.3.1 The ACS Client Test Program

The main idea of this thesis is to verify the ACS time system, so the first condition is to have the two time system components configured in a Container. This was done configuring a CDB with these containers which will be loaded at the first CORBA request. The next step is to use these components through the CORBA interface (IDL) that ACS provides. ACS components can interact with other distributed components, or with stand-alone clients. In terms of usage, clients and components share the same communication methods, so a test client program is enough to use the time system properly. Also, the usage of a client provides the possibility of directly define a high task priority, while a component would need special support from the container. The client used here is a C++ application that connects to the ACS Manager and asks for the TIMER and CLOCK component references (see figure 5.3). Then the client registers

58 a callback with a given interval for a periodic call in the TIMER component. When the callback is invoked, the client saves the current time that the CLOCK component delivers in memory. This process repeats for the given duration of the experiment, and at the end of the run the client saves the results to a file.

5.3.2 Concurrent Task Extension

An important issue in a distributed system like ACS is concurrency. To do so, the client also spawns new clients using fork, which simultaneously use the TIMER component with different frequencies, as figure 5.3 shows. The starting time of the experiment is also coordi- nated using the CLOCK component, and a preliminary global wait period was set to avoid start up distortions.

5.3.3 External Reader Extension

As the measured time values of the clients are based on the same system that is been verified, they are not reliable. Therefore, the parent client process also sends a signal through the RS232 (serial) port (see figure 5.3). An external machine reads the signal and registers the timestamp using a high resolution timer. Here the external reader is the same program described in section 4.3.3. This external reader represent the devices connected to the CAN, with the idea of checking how the devices on the CAN perceive the system behavior.

5.4 Results

The following results are the maximum jitter and the mean squared MSE of at least 10 realizations (default = 30), each one with an execution time of 20 seconds.

5.4.1 Local Verification

The first results are of executing the task locally and alone, registering the timestamps with the ACS CLOCK component (table 5.1). The results are awful, and obviously not real-time at all. For instance, the θ4 requirement has an error of more than 5000% in the worst case jitter. In the case of soft real-time (MSE), the error is still more that 500%, so

59 Figure 5.3: Local Machine Setup

60 √ Requirement Max Jitter Mean MSE

θ48 0.12208 0.02946

θ24 0.14779 0.07448

θ4 0.20617 0.03161

φ48 0.11967 0.00220

φ24 0.12417 0.00512

φ4 0.13910 0.00107

Table 5.1: Local verification using the CLOCK component timestamps

√ Requirement Max Jitter Mean MSE

θ48 0.02536 0.00079

θ24 0.03244 0.00294

θ4 0.02225 0.00485

φ48 0.02252 0.00077

φ24 0.02170 0.00082

φ4 0.04263 0.00085

Table 5.2: Local verification using the clock gettime() timestamps

there is no chance to this setup to be real-time. But this could be consequence of a bad application of the model: The model needs a precise and accurate timestamp function. As the timestamps are obtained by an ACS CLOCK component, a get time() call will use the CORBA infrastructure to obtain the value. This problem was detected because it is strange that the φ values were very similar to θ values, so it seems to be a delay rather than jitter. To fix this problem, the measures were redone using the POSIX clock gettime() func- tion. As shown in section 4.3, this function complies with the POSIX real-time specification, so better results are shown in table 5.2. These new results are not good enough for hard

real-time, but at least some values are fine for soft real-time. For instance for θ48 and φ48 the Quality of Service is more than 98%. It is clear that there is best QoS for φ requirements, but in general terms both types of requirements are not reliable enough for the values that the ALMA software needs.

61 Requirement Harmonic Non-Harmonic √ √ Max Jitter Mean MSE Max Jitter Mean MSE

θ48 0.10969 0.00468 0.02696 0.00833

θ24 0.15417 0.00922 0.10239 0.00501

θ4 0.18233 0.02101 0.17445 0.01543

φ48 0.096743 0.00590 0.018387 0.00195

φ24 0.087959 0.00162 0.023513 0.00165

φ4 0.141189 0.00208 0.103499 0,00121

Table 5.3: Local Verification with 8 harmonic/non-harmonic tasks

As the timeout for devices responses is of 150 [µs], it can be assumed that the timeout of the server is in the same range. Therefore, the a and b values can be setup to this value, so the −3 −6 −6 requirements can be fixed as θ48 = θ(48 · 10 , 150 · 10 , 150 · 10 ). Then the requirement

θ48 will accept a 0.3% of error, the θ24 a 0.6%, and the θ4 a 3.7%. The same values are applicable to φ requirements. As can be seen in table 5.2, there verification will not hold in any case, so we could confirm that the ACS TIMER (and the ACS CLOCK) do not comply with the real-time values that the ALMA CONTROL system needs.

5.4.2 Local Verification with Concurrency

It is almost obvious that if the local verification fails, the concurrent verification must fail also. Regardless, this thesis presents the results using concurrent tasks just as an example of the model application for further usage, and to analyze if several tasks affect the service performance or not. Table 5.3 summarizes the results for eight harmonic and non-harmonic tasks, In general terms, the concurrency has a direct effect on the jitter and on the MSE. All the values are almost one order above the lonely task of table 5.2. Also, harmonic tests shows worst jitter and MSE than the non-harmonic, because there are some time points where each one of the 8 processes of the harmonic test are requesting to be waken simultaneously; meanwhile the non-harmonic requests are less predictable but better distributed over the time line. Also it seems that for higher frequencies the jitter and MSE increase because the

62 √ Requirement Max Jitter Mean MSE

θ48 0.02866 0.00112

θ24 0.03103 0.00152

θ4 0.05149 0.01227

φ48 0.02838 0.00041

φ24 0.02995 0.00063

φ4 0.03857 0.00058

Table 5.4: External Verification Results load of the timer is much more. In summary, the timer do not manage the concurrency with any real-time mechanism, so each new task will include new delays to the whole system. It is clear that these values are not suitable for the ALMA requirements, but if the time system is improved to support real-time requirements, the concurrency should be taken in account because it produces new results, with a clear difference between harmonic and non-harmonic tasks.

5.4.3 Clock Drift Measurement

Before an external verification can be done, the clock drift problem must be considered. The idea of the coordination tick is to provide the same time domain to both machines, but this must be verified. The clock drift corrections of section 3.3 intend to correct time domains. To do this, a common reference clock was needed to obtain the clock drifts. In section 4.3.3 a NTP server was used to obtain the clock drifts of each system. This section uses the same mechanism to measure the clock drifts of the coordinated systems using the external tick. Figure 5.4(a) shows the clock drifts of the two systems without using the tick coor- dination. Figure 5.4(b) shows the same clock drifts updating the clocks with the coordi- nated tick. As the slopes of both system drifts are the same, the domain transformation

DExt→Sys(R) = R, because the quotient of the drifts is 1. Therefore, the setup with tick coordination does not need to be corrected as was expected.

63 (a) Clock drift measures over 300 seconds without coordination tick

(b) Clock drift measures over 300 seconds with coordination tick

Figure 5.4: Clock Drift Behavior With and Without Tick Coordination

64 5.4.4 External Verification

Table 5.4 summarizes the results of measuring the timestamps using an external machine. This experiment shows that the I/O of the serial port does not increase the jitter of the system. Only θ4 gives significantly worse max jitter and MSE values compared to table 5.2. But as the local measurements present very high jitters, the effect of the serial port usage could be hidden. Therefore, if the time system is improved, external verification is needed to ensure a complete analysis. In chapter 4 a similar deployment was verified, and the usage of the serial port introduced jitters near 1 [ms]. Here such errors would pass undetected due the inherent jitter of the time system.

65 Chapter 6

Conclusions and Future Work

This thesis has three major contributions: The empirical verification model, the ACS real- time platform support review, and the ACS Time System verification of periodic properties. Currently there is debate on what form real-time support in ALMA should take, and the current work gives a first step in the direction of making an informed decision in this vital area.

6.1 The Verification Model

This thesis presents two classes of basic periodic requirements, which are a substantial subset of the real world real-time requirements. As seen in the results of the experiments, the θ and φ requirements give very different jitter and MSE results, while a semi-formal description does not identify the differences between these two cases, producing misgivings, confusion and errors. This means that the first step when specifying a periodic requirement is to identify which class is a better fit. This empirical model does not intend to compete with or replace formal verification techniques, because the completeness of this approach is very weak. The model intends to provide engineers and developers with tools to stress test their implementations, and to use some important results from formal verification such as the decidability of the MITL formulas. Following the same idea, including new extensions and synthetic values deduced from the formal specification and verification could be very interesting and fruitful. Also, the introduced TUF functions could be used if the utility domain is formalized for instance

66 using fuzzy logic. Therefore, an interesting trend is to include custom TUF functions to this model. The flexibility of the model is high, because having a timestamp routine to register the realizations is the only system-related requirement. This model can be applied to a range of real-time systems and setups, from a scheduling algorithm to a large real-time framework working distributedly. In this thesis, the examples considered were Linux RT PREEMPT, QNX and the ACS Time System, but further work could compare CORBA middleware implementations, or scheduling algorithms.

6.2 ACS Real-Time Review

ACS was not designed for real-time, but as the astronomical instrumentation domain is often related to time critical hardware, engineers know that real-time systems will always be part of the system requirements. That is why the ACS Team developed the framework over ACE/TAO, and provides explicit support for VxWorks. From the three studied platforms, only one could be installed without problems.

• Linux RT PREEMPT: The port to this platform was successfully made, by patching the C++ code of ACS to be supported on a newer GCC. Currently it is supported by the ACS-UTFSM Team, and the patches will be soon part of the official ACS distribution.

• VxWorks: The port was assigned to the author of this thesis, and currently ACS compiles and executes on the desired machines, but as ACE/TAO fails, the official support was delayed waiting on a ACE/TAO stable version over this commercial RTOS.

• QNX: The code of ACS can be compiled over QNX, but a full refactoring of the ACS build system is needed to construct a complete runtime environment. This is a known problem of ACS that the ACS-UTFSM Team is trying to fix with an alternative building system using autotools.

These ports are focused only on compile and run over real-time platforms such as Linux RT PREEMPT, QNX or VxWorks, but do not guarantee any real-time behavior of ACS components. The idea is that ACS gets along with real-time software, but these efforts are

67 not for supporting real-time requirements inside ACS. The main reason of this problem is that verifying real-time requirements is very difficult on a framework such as ACS, so there is no benchmark to develop this support. Regardless, ACS could achieve soft real-time support over the Ethernet network by using the proven ACE/TAO real-time support. Alas, ALMA currently does not need such soft requirements (nor real-time over the Ethernet network), so that support is not in the ACS plans.

6.3 ACS Time System Verification

Clearly, the ACS Time System does not support real-time requirements, neither hard nor soft, but the application example of the verification model will help to improve this. ACS does not use the soft real-time that the TAO ORB provides, and does not offer an alternative interface for supporting local hard real-time requirement either. These two issues could be integrated in an ACS Real-Time Service that replaces the current component-based ACS Time System. The idea of this new service would be to provide a common infrastructure for real-time software using ACS. At one side a hard real-time service (for local use), as a wrapper of the POSIX real-time calls. This service should provide a common method of defining thread priorities, scheduling policies, etc. In the other side, a distributed soft real-time service could be developed by using the ACE/TAO bindings for real-time. This service must be compared to other alternatives (such the current ALMA work-arounds using RTAI and alternative implementations running directly over a real-time Linux kernel), and its performance checked against ALMA real-time requirements. For these validations, the verification model could be used as benchmark framework and the programs developed for this thesis could be used as a starting point for testing the new ACS Real-time Service.

68 Bibliography

[Abb06] Doug Abbott. Linux for Embedded and Real-Time Applications. Newnes, sec- ond edition, 2006.

[AFH91] Rajeev Alur, Tomas Feder, and Thomas A. Henzinger. The benefits of relaxing punctuality. In Symposium on Principles of Distributed Computing, pages 139– 152, 1991.

[AG93] A. P. Atlee and H. Gannon. Specifying and verifying requirements of real-time systems. IEEE Transactions Software Engineering, 19(1):41–55, 1993.

[AH90] R. Alur and T. A. Henzinger. Real-time logics: Complexity and expressiveness. In Proceedings of the Fifth Annual IEEE Symposium on Logic in Computer Sci- ence, pages 390–401, Washington, D.C., 1990. IEEE Computer Society Press.

[AH91] R. Alur and T. A. Henzinger. Logics and Models of Real-Time: A Survey, volume 600 of Lecture Notes in Computer Science, pages 74–106. Springer Verlag, 1991.

[Alv87] Angel Alvarez. Real-time programming and priority interrupt systems. In Pro- ceedings of the First International Workshop on Real-Time Ada Issues (IRTAW ’87), pages 97–100, New York, NY, USA, 1987. ACM Press.

[Ava07] Jorge Avarias. Repackaging ACS for embedded systems. Technical Report 12/2007, Departamento de Inform´atica, Universidad T´ecnicaFederico Santa Mar´ıa,2007.

[BJMO06] Fran¸coiseBellegarde, Jacques Julliand, Hassan Mountassir, and Emilie Oudot. Experiments in the use of t-simulations for the components-verification of real- time systems. In Proceedings of the 2006 Conference on Specification and Ver- ification of Component-Based Systems (SAVCBS ’06), pages 33–40, New York, NY, USA, 2006. ACM.

[BM00] P. Bosch and S. J. Mullender. Real-time disk scheduling in a mixed-media file system. In Proceedings of the Sixth IEEE Real-Time Technology and Applica- tions Symposium (RTAS 2000), pages 23–32, May 2000.

69 [BM05] Sanjeev Baskiyar and Natarajan Meghanathan. A survey of contemporary real-time operating systems. Informatica (Slovenia), 29(2):233–240, 2005.

[Bou08] Patricia Bouyer. Model-checking timed temporal logics. In Electronic Notes in Theoretical Computer Science. Elsevier Science Publisher, 2008.

[Bro97] G. Brose. JacORB: Implementation and design of a Java ORB, 1997.

[BSV94] Felice Balarin and Alberto L. Sangiovanni-Vincentelli. Iterative algorithms for formal verification of embedded real-time systems. In Proceedings of the 1994 IEEE/ACM International Conference on Computer-Aided Design (IC- CAD ’94), pages 450–457, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press.

[BY97] Michael Barabanov and Victor Yodaiken. Introducing real-time Linux. Linux Journal, 1997(34es):5, 1997.

[C+02] Gianluca Chiozzi et al. CORBA-based common software for the ALMA project. In Proceedings of SPIE 2002, 2002.

[C+04] Gianluca Chiozzi et al. The ALMA Common Software: A developer friendly CORBA-based framework. In Proceedings of SPIE 2004, 2004.

[CHL99] P. Carrere, J. Hermant, and G. Le Lann. In pursuit of correct paradigms for object-oriented real-time distributed systems. In Proceedings of the 2nd IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC ’99), pages 271 – 279, May 1999.

[Col] Jim Collier. An overview tutorial of the VxWorks real-time operating system.

[CW76] H. J. Curnow and B. A. Wichmann. A synthetic benchmark. Computer Journal, 19(1):43–49, 1976.

[DKK] L.R. Dalesio, M. R. Kraimer, and A. J. Kozubal. EPICS overview.

[DM03] L. Dozio and P. Mantegazza. Real time distributed control systems using RTAI. In Sixth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing, pages 11–18, May 2003.

[Dou03] Bruce Powel Douglass. Real-Time Design Patterns: Robust Scalable Architec- ture for Real-Time Systems. Addison-Wesley, Boston, 2003.

[Dru96] J. Drummond. Establishing a real-time distributed benchmark. In Proceed- ings of the 4th International Workshop on Parallel and Distributed Real-Time Systems, pages 200–201, April 1996.

70 [DSW90] P. Donohoe, R. Shapiro, and N. Weiderman. Hardstone benchmark results and analysis. Technical Report CMU/SEI-90-TR-7, Carnegie Mellon University – Software Engineering Institute, 1990.

[FLRS04] Shahrooz Feizabadi, Peng Li, Binoy Ravindran, and Syed Suhaib. A for- mally verified application-level framework for real-time scheduling on POSIX real-time operating systems. IEEE Transactions on Software Engineering, 30(9):613–629, 2004.

[FMW+00] R. Freedmen, J. Maurer, V. Wolfe, S. Wohlever, M. Milligan, and B. Thu- raisingham. Benchmarking real-time distributed object management systems for evolvable and adaptable command and control applications. In Proceed- ings of the 3rd IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC 2000), pages 202–205, March 2000.

[FWDC+00] Victor Fay-Wolfe, Lisa C. DiPippo, Gregory Copper, Russell Johnston, Peter Kortmann, and Bhavani Thuraisingham. Real-time CORBA. IEEE Transac- tions on Parallel and Distributed Systems, 11(10):1073–1089, 2000.

[GMS94] Kaushik Ghosh, Bodhisattwa Mukherjee, and Karsten Schwan. A survey of real-time operating systems. Technical Report GIT-CC-93/18, Georgia Insti- tute of Technology, Atlanta, Georgia, 1994.

[GPSS80] Dov Gabbay, Amir Pnueli, Saharon Shelah, and Jonathan Stavi. On the tem- poral analysis of fairness. In Proceedings of the 7th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’80), pages 163– 173, New York, NY, USA, 1980. ACM.

[Gri] Duncan Grisby. The omniORB version 4.1 user’s guide.

[Har87] David Harel. Statecharts: A visual formalism for complex systems. Science of Computer Programming, 8(3):231–274, June 1987.

[Hei98] Constance Heitmeyer. On the need for “practical” formal methods. Lecture Notes in Computer Science, 1486:18–26, 1998.

[Hil92] Dan Hildebrand. An architectural overview of QNX. In Proceedings of the Workshop on Micro-Kernels and Other Kernel Architectures, pages 113–126, Berkeley, CA, USA, 1992. USENIX Association.

[HM96] Constance Heitmeyer and Dino Mandrioli. Formal Methods for Real-Time Computing. John Wiley & Sons, Inc., New York, NY, USA, 1996.

[HVC07] Jinfeng Huang, Jeroen Voeten, and Henk Corporaal. Predictable real-time software synthesis. Real-Time Systems, 36(3):159–198, 2007.

71 [IEE94] IEEE. POSIX Part 1: System API – Amend. 1: Realtime Extension [C Lan- guage]. 1994.

[IEE96] IEEE. 1996 (ISO/IEC) [IEEE/ANSI Std 1003.1, 1996 Edition] Information Technology – Portable Operating System Interface (POSIX R ) – Part 1: System Application Program Interface (API) [C Language]. 1996.

[Int95] International Standards Organization. Information technology – Programming languages – Ada. ISO/IEC 8652:1995(E), 1995.

[JLT85] E. Jensen, C. Locke, and H. Tokuda. A time driven scheduling model for real- time operating systems. In Proceedings IEEE Real-Time Systems Symposium, pages 112–122, 1985.

[Kim00] K. H. Kim. APIs for real-time distributed object programming. IEEE Com- puter, 33(6):72–80, June 2000.

[Kim04] K. Kim. Fundamental research challenges in real-time distributed computing. In Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems (FTDCS 2004), pages 2–9, May 2004.

[Koy87] Ron Koymans. Specifying message passing systems requires extending temporal logic. In Temporal Logic in Specification, pages 213–223, 1987.

[KSK04] A. Krishna, D. Schmidt, and R. Klefstad. Enhancing real-time CORBA via real-time Java features. In Proceedings of the 24th International Conference on Distributed Computing Systems, pages 66–73, 2004.

[KVdR83] Ron Koymans, Jan Vytopil, and Willem P. de Roever. Real-time programming and asynchronous message passing. In Proceedings of the Second Annual ACM Symposium on Principles of Distributed Computing (PODC ’83), pages 187– 197, New York, NY, USA, 1983. ACM Press.

[Lap92] Phillip A. Laplante. Real-Time Systems Design and Analysis: An Engineer’s Handbook. IEEE Press, Piscataway, NJ, USA, 1992.

[LB94] G. Li and J. Bacon. Supporting distributed real-time objects. In Proceedings of the Second Workshop on Parallel and Distributed Real-Time Systems, pages 138–143, April 1994.

[Mar65] James Martin. Programming Real-Time Computer Systems. Prentice-Hall, 1965.

[MDP00] P. Mantegazza, E. L. Dozio, and S. Papacharalambous. RTAI: Real Time Application Interface. Linux Journal, 2000(72es):10, 2000.

72 [Meo05] Fabrizio Meo. Open controller enabled by an advanced real-time network (OCEAN). In The Industrial Information Technology Handbook, pages 0–. 2005.

[MW85] Stephen J. Mellor and Paul T. Ward. Implementation Modeling Techniques, volume 3 of Structured Development for Real-Time Systems. Yourdon Press Computing Series, 1985.

[MW86a] Stephen J. Mellor and Paul T. Ward. Essential Modeling Techniques, volume 2 of Structured Development for Real-Time Systems. Yourdon Press Computing Series, 1986.

[MW86b] Stephen J. Mellor and Paul T. Ward. Introduction and Tools, volume 1 of Structured Development for Real-Time Systems. Yourdon Press Computing Series, 1986.

[NJ83] John Nagle and Scott Johnson. Practical program verification: Automatic program proving for real-time embedded software. In Proceedings of the 10th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Lan- guages (POPL ’83), pages 48–58, New York, NY, USA, 1983. ACM.

[Obj] Object Management Group. The object management group webpage.

[Obj02] Object Management Group. CORBA 3.0 - formal/02-06-01, 2002.

[Obj03] Object Management Group. Real-time CORBA, v2.0 – formal/03-11-01, 2003.

[OW07] Jo¨elOuaknine and James Worrell. On the decidability and complexity of Metric Temporal Logic over finite words. Logical Methods in Computer Science, 3(1):1– 27, February 2007.

[P+02] Mark Plesko et al. ACS – The advanced control system. In 4th International Workshop on Personal Computers and Particle Accelerator Controls, 2002.

[Pis08] Joe Pisano. Current state of the decisions about real-time in ALMA. Personal communication, to be published, May 2008.

[Pnu77] Amir Pnueli. The temporal logic of programs. In Proceedings of the 18th Annual Symposium on the Foundations of Computer Science, pages 46–57, New York, 1977. IEEE Computer Society.

[PR00] A. Puder and K. Romer. MICO: An open source CORBA implementation, 2000.

[P.S] P.Sivera. VLT common software — Overview. VLT-MAN-ESO-17200-0888.

[Rab90] P. Rabindra. Implementing the Rhealstone real-time benchmark. Dr. Dobb’s Journal, 15(4):46–55, 1990.

73 [Rav02] B. Ravindran. Engineering dynamic real-time distributed systems: Architec- ture, system description language, and middleware. IEEE Transactions on Software Engineering, 28(1):30–57, January 2002.

[RCG01] Gianni Raffi, Gianluca Chiozzi, and Brian Glendenning. ALMA Common Soft- ware (ACS) as a basis for a distributed software development. In Proceedings of the 11th Astronomical Data Analysis Software & Systems Conference, 2001.

[RH07] S. Rostedt and D. V. Hart. Internals of the RT patch. In Proceedings of the Linux Symposium 2007, 2007.

[RJL05] Binoy Ravindran, E. Douglas Jensen, and Peng Li. On recent advances in time/utility function real-time scheduling and resource management. In Pro- ceedings of the Eighth IEEE International Symposium on Object-Oriented Real- Time Distributed Computing (ISORC ’05), pages 55–60, Washington, DC, USA, 2005. IEEE Computer Society.

[S+04] Heiko Sommer et al. Container-component model and XML in ALMA ACS. In Proceedings of SPIE 2004, 2004.

[SC03] Heiko Sommer and Gianluca Chiozzi. Transparent XML binding using the ALMA Common Software (ACS) container/component framework. In Proceed- ings of the 13th Astronomical Data Analysis Software & Systems Conference, 2003.

[Scha] Douglas Schmidt. The ACE library webpage.

[Schb] Douglas Schmidt. The ACE ORB webpage.

[Sch99] Douglas C. Schmidt. Middleware techniques and optimizations for real-time, embedded systems. In Proceedings of the 12th International Symposium on System Synthesis (ISSS 99), page 12, Washington, DC, USA, 1999. IEEE Com- puter Society.

[Seg95] Roberto Segala. Modeling and Verification of Randomized Distributed Real- Time Systems. PhD thesis, Laboratory for Computer Science, Massachusetts Institute of Technology, 1995.

[SGG04] Avi Silberschatz, Peter Baer Galvin, and Greg Gagne. Operating System Con- cepts. John Wiley, seventh edition, 2004.

[Sie06] Sam Siewert. Real-Time Embedded Components and Systems (Computer En- gineering). Charles River Media, Inc., Rockland, MA, USA, 2006.

[SJ92] N. B. Serbedzija and S. Jahnichen. High-level real-time distributed program- ming. In Proceedings of the Third Workshop on Future Trends of Distributed Computing Systems, pages 72–78, April 1992.

74 [SK00] Douglas C. Schmidt and Fred Kuhns. An overview of the real-time CORBA specification. Computer, 33(6):56–63, 2000.

[SRR03] H. Sertic, F. Rus, and R. Rac. UML for real-time device driver development. In Proceedings of the 7th International Conference on Telecommunications (Con- TEL 2003), pages 631–636, June 2003.

[Sta96] John A. Stankovic. Real-time and embedded systems. ACM Computing Sur- veys, 28(1):205–208, 1996.

[Tar08] Massimo Tarenghi. The Atacama Large Millimeter/submillimeter Array: Overview & status. Astrophysics and Space Science, 313(1-3):1–7, January 2008.

[TGFB02] Andr´es Terrasa, Ana Garc´ıa-Fornes, and Vicente J. Botti. Flexible real-time Linux*: A flexible hard real-time environment. Real-Time Systems, 22(1- 2):151–173, 2002.

[TWS06] Lothar Thiele, Ernesto Wandeler, and Nikolay Stoimenov. Real-time interfaces for composing real-time systems. In Proceedings of the 6th ACM & IEEE International Conference on Embedded Software (EMSOFT ’06), pages 34–43, New York, NY, USA, 2006. ACM Press.

[WDS90] Nelson Weiderman, Patrick Donohoe, and Ruth Shapiro. Benchmarking for deadline-driven computing. In Proceedings of the Conference on TRI-Ada ’90, pages 254–264, New York, NY, USA, 1990. ACM.

[Wel90] A. J. Wellings. Real-time requirements. In Proceedings of the Fourth Inter- national Workshop on Real-time Ada Issues (IRTAW ’90), pages 1–16, New York, NY, USA, 1990. ACM Press.

[Yam04] N. Yamasaki. Responsive multithreaded processor for distributed real-time control. In Proceedings of the 8th IEEE International Workshop on Advanced Motion Control (AMC ’04), pages 457–462, March 2004.

[YB] V. Yodaiken and M. Barabanov. A real-time Linux. Online at http://rtlinux.cs.nmt.edu/rtlinx/u.pdf.

75