Execution Time Monitoring in

Mikael Asberg,˚ Thomas Nolte Clara M. Otero Perez´ Shinpei Kato MRTC/Malardalen¨ University NXP Semiconductors/Research The University of Tokyo P.O. Box 883, SE-721 23, High Tech Campus 32 (102) 7-3-1 Hongo, Bunkyo-ku, Vaster¨ as,˚ Sweden 5656 AE Eindhoven, Holland Tokyo 113-8656, Japan [email protected] [email protected] [email protected]

Abstract estimate and the current system state. For example, if only one component is active, then it can run at the maximum This paper presents an implementation of an Execution QoS using 75% of a given resource. However, if a sec- Time Monitor (ETM) which can be applied in a resource ond identical component is started, then both components management framework, such as the one proposed in the will have to adapt to a lower QoS where each of them uses Open Media Platform (OMP) [4]. OMP is a European 50% of the same resource. project which aims at creating an open, flexible and re- In media processing systems, the load is very dynamic source efficient software architecture for mobile devices and the worst case is often unknown (unpredictable) or too such as cell phones and handsets. One of its goals is to high to do worst case resource allocation (since it will im- open up the possibility for software portability and fast ply wasting resources). The initial resource estimate pro- integration of applications, in order to decrease develop- vided by the component is used to create the initial allo- ment costs. The task of the ETM is to measure task exe- cation of resources. However, this initial allocation has to cution time and provide this information to the scheduler be monitored, at run time, to make sure that the allocation which then can schedule tasks in a more efficient and dy- matches the actual resource utilization. For example, the namic way. This implementation is our first step towards estimated processor cycles of decoding one frame of an a full resource management framework that later will in- H264 stream, measured in a given platform, can vary more clude a hierarchical scheduler, for soft real-time systems. than 5 times depending on the video content (and type of frame, number of reference frames .etc). The RMF keeps track of the actual usage (with the use of an ETM) and 1 Introduction signals an alarm when a component consistently under or over utilizes its initial allocation. This alarm mechanism Our previous work includes, among others, a synchro- enables the upper software layers to learn and adapt the nization protocol for hierarchical [9] and the resource allocation to the real need. The aim of the ETM implementation of a two-level Hierarchical Scheduling is to measure CPU resource utilization and give this in- Framework (HSF) in VxWorks [8]. This paper presents a formation to the RMF, in order for it to conduct resource continuation of our research and aims towards HSF in soft allocation adjustments during runtime. The implementa- real-time scheduling, such as media applications. HSF tion of the ETM is done in Linux because it has a high provides temporal isolation among concurrent executing degree of functionality (an abundance of libraries and de- applications. This isolation enables guaranteed resource vice drivers) in embedded systems and mobile phones [2]. availability for multiple media applications in the context of open platforms, such as the one targeted by the Open Resource management The Resource manager (Fig- Media Platform (OMP) [4]. In this project, media com- ure 1) is in charge of allocating/deallocating resources to ponents encapsulating functionality (such as video/audio application components. It also provides admission con- decoders) have the capability to trade Quality Of Service trol (checking feasibility before allocating requested re- (QoS) for resource usage, e.g., memory or CPU resources. sources) and enforcement to ensure that the allocated re- The aim of the OMP project is to standardize the inter- sources are also guaranteed. The resource manager has faces, used to adapt QoS, to achieve portability, reusability an overview of the system resources, whereas for ev- and easy integration. In particular, to trade resources, the ery shared resource, the corresponding Resource con- Resource Management Framework (RMF) offers an in- troller implements the resource allocation and monitoring terface to allocate a guaranteed amount of resource (e.g., mechanisms for that resource. For each resource, the re- CPU) to components, based on the component resource source controller keeps track of the allocation of resources ∗The work in this paper is supported by the Swedish Foundation for (Resource budget), its users (Resource users which is Strategic Research (SSF), via the research programme PROGRESS. basically components) and the runtime usage (Resource usage) of a given resource user on a given resource bud- then iterates trough a chain of scheduling classes (Fig- get. ure 3), where each of these classes tries to fetch the next running task. Eventually, idle class will pick a next task to Resource manager run if no higher class has succeeded with this. All schedul- ing classes share the same interface (set of functions), this Resource controller Resource controller Resource usage 35% Resource budget 16% makes it is easy to add new scheduling classes without changing the core scheduler. Resource user

Resource usage 39% Resource (memory) Resource (CPU) class = sched class highest /* The latter is a global kernel variable, 1st: real-time scheduling class. */ Figure 1. Resource management WHILE (TRUE) p = class.pick next task(run queue) IF(p) CPU resource allocation The CPU resource controller RETURN p is realized with a HSF, illustrated in Figure 2. Our next ENDIF step is to implement this scheduler with minimum mod- class = class.next /* 2nd: fair scheduling class. */ ification on the , similar to the imple- ENDWHILE mentation in [8]. The Global scheduler will schedule the Budgets according to their allocation (T ime interface) Figure 3. Sample pseudo-code of function pick next task, and an arbitrary scheduling scheme. T asks are scheduled which is part of the Linux scheduler (sched.c) within the budget according to the scheduling strategy of the Local scheduler. The global scheduler schedules the The outline of the paper is as follows: Section 2 budgets periodically, the budget runs according to a pre- presents related work. In Section 3 we present the im- defined amount of time and the order of execution of the plementation of our ETM in Linux and discuss how it can budgets are decided by the budget priorities. The admis- be applied. Finally, Section 4 concludes. sion control checks whether the current configuration of budgets are schedulable. This check is done if new bud- 2 Related work gets are activated or if running budgets need to change pa- rameters in their time interface. Tasks within a budget are The AQuoSA framework [11] is a RMF based on CBS assumed to be schedulable, with respect to the budget time scheduling with advanced adaptive resource reservation. interface. Parameters in the budget time interface such as The framework is built on a patch which exports appro- period, execution time and priority may be dynamic, e.g., priate scheduling hooks. they may change during runtime. Unused budget may be In [12], the authors present a RMF consisting of server allocated to tasks belonging to other budgets (weak en- based scheduling of tasks and a predictable disk schedul- forcement). ing mechanism. CPU usage monitoring is used within the framework to calibrate application resource usage. Global scheduler RTAI [7] is a collection of loadable modules that pro- vides a rich real-time Application Programming Interface

Time interface Time interface Time interface (API) to the user, trough the usage of a hardware abstrac- Budget Budget Budget tion layer named ADEOS [1]. The RTAI API includes Local Local Local Scheduler Scheduler Scheduler add/delete hooks for every task start, switch and delete. RT-Linux [5] is a patch to the Linux kernel and intro- Task ... Task ... Task ... Task ... Task ... Task duces a layer between the OS and the hardware. Linux is scheduled as low priority task while real-time tasks have Figure 2. Hierarchical scheduling framework higher priority (similar to [7]). Scheduling classes in Linux Starting from Linux ker- Related to CPU allocation, cgroups [3] is a Linux nel version 2.6.23 (released in October of 2007) [6], patch which partitions processes into groups and give there is a distinction between real − time and regular them a specified share of the processor. Each partition is tasks. This is done by the introduction of two schedul- scheduled according to its period and will run a predefined ing classes: real-time and fair scheduling (sched rt.c amount of time. All groups must have the same period. and sched fair.c). Basically, a Linux process is either regular or real − time depending on which scheduling 3 Execution time monitoring policy that it is configured with. The processes which are real−time have higher priority (0-99 where 0 is the high- The ETM provides the resource manager with infor- est) than regular processes. Whenever there is a schedul- mation, in order for it to analyze the suitability and to ing event (time-slice expiration, task suspend/sleep, etc.), fit a resource budget for a given resource user(s). Most the CPU (for which the scheduling event belongs to) run- monitors are time-stamp based with a coarse time-base, queue (ready-queue) is fetched. The Linux core scheduler e.g., time consumed by a task during the past 5 seconds. In this case we get an average value, without knowledge hook fair. of whether the time was consumed from its own bud- get or from a different one. To accurately fit a periodic org pick next task rt = sched class highest.pick next task budget we need to monitor resource usage per scheduling sched class highest.pick next task = hook rt event, e.g., how much of a given budget allocation a task consumed during the past budget period. This, so called event based monitoring, also enables the possibility that Figure 5. Redirection of .pick next task in rt sched class a task can keep track of its own progress. A decoder that has decoded half of a frame can make a request to the monitor of how much budget that is left for the period. FUNCTION hook rt(run queue) In this way, it can derive whether it is able to decode the prev task = current /* The latter is a global kernel variable. */ remainder of the frame within this budget period. next task = org pick next task rt(run queue) task switch hook(prev task, next task, HOOK RT) RETURN next task Implementation The Linux operating system offers, to ENDFUNCTION the best of our knowledge, only the proc file − system and taskstats/cgroupstats [6] as a way to monitor pro- Figure 6. Function hook rt cess/thread information. However, these approaches are not suitable for our application for several reasons. It re- Applicability of the ETM This chapter gives a pro- quires additional interrupt-routines/tasks (extra overhead). posal of how the ETM can be used and also compares Alternatively, it must be fully integrated with the HSF. the two following approaches to monitor execution time However, this is not a modular solution. Also, the pre- in Linux: 1) having tasks or interrupts reading total exe- sented solutions [7, 13] in Section 2 require to much ker- cution time of tasks from the kernel (proc file − system nel modifications. We have chosen to implement our ETM or taskstats/cgroupstats) or, 2) by using task switch as a task switch hook interface (event based monitoring) hooks and timestamp task start and end. Note that ap- that can be used in kernel space. The hook is implemented proach 2) could alternatively be integrated in the HSF, in a kernel module (similar to our work in [10]) which which would not require it to have tasks or interrupts. makes it easy to modify (without the need to modify and However, as mentioned previously, it is more modular to re-compile the kernel). It is just a matter of loading the keep the monitor and scheduler separate. module into the kernel. Our interface is simply a function The main task of the ETM is to measure the execution called task switch hook which is called at every sched- time of a group of tasks within time intervals. The in- uler tick. The supplied information is the previous and tervals could be the Least Common Multiplier (LCM) of next running tasks thread identification (TID) and a flag, all task periods within a budget, defined as LCM . informing whether the task switch is between a regular budget This is illustrated in Figure 7, where the LCM of and real − time task (or vice versa) or not. budget the tasks within the budget is 120. If tasks are fixed within The task switch hook implementation requires a min- a budget, then LCM can be computed off-line. imum patch, simply by removing the const statement budget from each of the two data-structures fair sched class and rt sched class (Figure 4). Security and kernel stability is Task A LCM = 120 Task B budget Task A Task C Task B not considered in this implementation. Task A Task A Task C Task C Task B Task C Task A

0 20 40 60 80 100 120 { static const struct sched class rt sched class = Budget (Period=50, Exec. time=20) . Task A (Period=30, Priority=2)

. Task B (Period=60, Priority=1) Task C (Period=40, Priority=0) Budget period start Task period start . .pick next task = pick next task rt, Figure 7. Tasks executing within a budget

In this way, the CPU utilization of the tasks within Figure 4. Sample of scheduling class rt sched class a budget could be derived and compared against the re- (sched rt.c) source budget periodically (at each LCMbudget). With The reason for removing the read-only access for approach 1), it would require either to have one or sev- these two data-structures is because we can then reas- eral tasks or interrupt handlers running periodically (at sign the .pick next task (Figure 4), of a scheduling class, end and start of all budgets LCMbudget) and reading the to point to our own defined function (Figure 5). The task execution time counters. In the example in Figure 7, .pick next task (in fair sched class and rt sched class) the task/handler would run at time 0 and 120 and log the are called by the scheduler (Figure 3) whenever there tasks execution time counters at those two events. With should be a task context switch (at least at those oc- approach 2), if we assume that pick next task (Figure 3) casions). This implies that our function will also be is executed whenever the Linux scheduler is invoked, even called. The next step is to define hook rt (Figure 6) and if there is no task context switch, then our monitor will run at the precise moments when LCMbudget starts and sible and it should also support task execution monitoring ends. This assumption is valid since the scheduler checks within different contexts. For example, a task may be exe- whether pick next task has chosen another task (other cuting within its own budget or in other budgets (depend- than the current running task) to run (according to the ing on the enforcement of the global scheduler). Also, the Linux scheduler function schedule() in sched.c). In the monitor should be as independent as possible with respect example in Figure 7, monitoring the tasks would start (and to the Hierarchical Scheduling Framework (HSF) in order end) when the hook is executed at period start of either the have a modular design. We have motivated why our task A, B or C at time 0 and 120 (our hook will notice this solution (implementation of the task switch hook mecha- by checking the status of the run-queue). The advantage nism) is best fitted for monitoring in a RMF in Linux. Im- with approach 2) is that it does not add to extra context plementing task switch hooks seems to impose less over- switch overhead (besides the execution of the monitor) head in comparison with reading task execution informa- since the task context switch overhead would exist even tion from the kernel. This makes it interesting (and will if we did not use the hook. Also, the monitor is executed be part of our future work) to perform an evaluation of our more precisely (than approach 2 which would have to rely solution together with, for example, proc file − system on a periodic timer) at the moments when the measuring and taskstats/cgroupstats [6] solutions. periods start and end. Future work will include the implementation of a HSF If the global scheduler uses weak enforcement then any in Linux. The task switch hook implementation presented unused budget will be given to background tasks (belong- in this paper can be useful in that it can suppress the Linux ing to other budgets). This approach increases CPU uti- scheduler functionality (because we are in control of the lization. The problem here is that approach 1) would con- actual task switch). The goal however, is to implement a sider task execution time within other budgets (Not Al- hierarchical scheduler in Linux as a middleware (similar located Consumed (NAC)) as execution time within its to [8]) without any patches of the kernel. This requirement own budget (Allocated Consumed (AC)), since the total is inherent from embedded systems, which have higher re- task execution time counter cannot know the difference in liability and stability requirements and they prefer to use which context the task executes. Approach 2) would con- proven versions of the Linux kernel. None of the tech- sider in which context the task executes in. At each hook niques/solutions presented in Section 2 [5, 7, 11, 12] ful- execution instance, it could check the Task Control Block fills this requirement except for using cgroups [3]. The fi- (TCB) or the global scheduler to see if a task is within nal step is to perform an evaluation of our HSF and prefer- NAC or AC. ably compare it against other similar schedulers [11, 12] The motivation for monitoring AC and NAC time is as well as native Linux mechanisms such as cgroups [3]. that it reveals not only if the budget supply is sufficient enough but also, if budget execution time is properly dis- References tributed. A task may use NAC time if it has insufficient budget supply and/or if the budget execution time distri- [1] Adeos. http://home.gna.org/adeos. bution is insufficient (task is mostly active when there is [2] Limo Foundation. http://www.limofoundation.org. no budget execution time available). This is illustrated in [3] LXR. http://lxr.linux.no. Figure 8 where example a) shows that the amount of bud- [4] Open Media Platform project. get supply is sufficient with respect to the task demand, http://www.openmediaplatform.eu. but the budget execution time distribution is inadequate. [5] RTLinuxFree. http://www.rtlinuxfree.com/. In b), clearly the budget supply is insufficient, that is why [6] The Linux Kernel Archives. the task uses NAC time. http://www.kernel.org. [7] Rtai-the realtime application interface for linux from diapm. https://www.rtai.org. ˚ a) [8] M. Behnam, T. Nolte, I. Shin, M. Asberg, and R. J. Bril. Towards hierarchical scheduling on top of vxworks. In 0 b) OSPERT 08, July 2008.

NAC [9] M. Behnam, I. Shin, T. Nolte, and M. Nolin. Sirap: A Budget Task AC Budget period start Task period start synchronization protocol for hierarchical resource sharing in real-time open systems. In EMSOFT 007, 2007. Figure 8. NAC and AC execution time [10] S. Kato and N. Yamasaki. Modular real-time linux. In RT LW S008, October 2008. 4 Summary [11] L. Palopoli, T. Cucinotta, L. Marzario, and G. Lipari. Aquosa—adaptive quality of service architecture. Softw. Task execution monitoring in Linux is challenging due Pract. Exper., 39(1):1–31, 2009. to its lack of real-time support. Most real-time oper- [12] R. Rajkumar, K. Juvva, A. Molano, and S. Oikawa. Re- ating systems support the task switch hook mechanism, source kernels: a resource-centric approach to real-time which is a popular method for monitoring task execution and multimedia systems. pages 476–490, 2001. [13] C. Wright, C. Cowan, J. Morris, S. Smalley, and G. Kroah- time. With respect to a Resource Management Framework Hartman. Linux security modules: General security sup- (RMF), such as the one proposed in Open Media Platform port for the linux kernel. In USENIX002, August 2002. (OMP) [4], a monitor should give as exact results as pos-