Hrtimers and Beyond: Transforming the Linux Time Subsystems

Hrtimers and Beyond: Transforming the Linux Time Subsystems Thomas Gleixner Douglas Niehaus linutronix University of Kansas [email protected] [email protected] Abstract will complement and complete the hrtimers and timeofday components to create a solid foun- dation for architecture independent support of Several projects have tried to rework Linux high-resolution timers and dynamic ticks. time and timers code to add functions such as high-precision timers and dynamic ticks. Pre- vious efforts have not been generally accepted, in part, because they considered only a sub- 1 Introduction set of the related problems requiring an integrated solution. These efforts also suffered significant architecture dependence creating com- Time keeping and use of clocks is a fundamen- plexity and maintenance costs. This paper tal aspect of operating system implementation, presents a design which we believe provides a and thus of Linux. Clock related services in op- generally acceptable solution to the complete erating systems fall into a number of different set of problems with minimum architecture de- categories: pendence. • time keeping The three major components of the design are hrtimers, John Stulz’s new timeofday approach, • clock synchronization and a new component called clock events. Clock events manages and distributes clock • time-of-day representation events and coordinates the use of clock event • next event interrupt scheduling handling functions. The hrtimers subsystem has been merged into Linux 2.6.16. Although • process and in-kernel timers the name implies “high resolution” there is no change to the tick based timer resolution at • process accounting this stage. John Stultz’s timeofday rework ad- • process profiling dresses jiffy and architecture independent time keeping and has been identified as a fundamen- tal preliminary for high resolution timers and These service categories exhibit strong interac- tickless/dynamic tick solutions. This paper pro- tions among their semantics at the design level vides details on the hrtimers implementation and tight coupling among their components at and describes how the clock events component the implementation level. 334 • Hrtimers and Beyond: Transforming the Linux Time Subsystems Hardware devices capable of providing clock 2 Previous Efforts sources vary widely in their capabilities, accu- racy, and suitability for use in providing the de- sired clock services. The ability to use a given A number of efforts over many years have ad- hardware device to provide a particular clock dressed various clock related services and func- service also varies with its context in a unipro- tions in Linux including various approaches to cessor or multi-processor system. high resolution time keeping and event scheduling. However, all of these efforts have encoun- tered significant difficulty in gaining broad ac- Arch 1 ceptance because of the breadth of their impact Timekeeping TOD Clock source HW on the rest of the kernel, and because they gen- Tick ISR Clock event source HW erally addressed only a subset of the clock related services in Linux. Arch 2 Process acc. TOD Clock source HW Profiling Interestingly, all those efforts have a common ISR Clock event source HW design pattern, namely the attempt to inte- Jiffies grate new features and services into the existing Arch 3 clock and timer infrastructure without changing Timer wheel TOD Clock source HW the overall design. ISR Clock event source HW There are no projects to our knowledge which attempt to solve the complete problem as we understand and have described it. All existing Figure 1: Linux Time System efforts, in our view, address only a part of the whole problem as we see it, which is why, in Figure 1 shows the current software architec- our opinion, the solutions to their target prob- ture of the clock related services in a vanilla 2.6 lems are more complex than under our pro- Linux system. The current implementation of posed architecure, and are thus less likely to be clock related services in Linux is strongly asso- accepted into the main line kernel. ciated with individual hardware devices, which results in parallel implementations for each architecture containing considerable amounts of essentially similar code. This code duplication 3 Required Abstractions across a large number of architectures makes it difficult to change the semantics of the clock related services or to add new features such The attempt to integrate high resolution timers as high resolution timers or dynamic ticks be- into Ingo Molnar’s real-time preemption patch cause even a simple change must be made in led to a thorough analysis of the Linux timer so many places and adjusted for so many im- and clock services infrastructure. While plementations. Two major factors make im- the comprehensive solution for addressing the plementing changes to Linux clock related ser- overall problem is a large-scale task it can be vices difficult: (1) the lack of a generic ab- separated into different problem areas. straction layer for clock services and (2) the assumption that time is tracked using periodic timer ticks (jiffies) that is strongly integrated • clock sources management for time keep- into much of the clock and timer related code. ing 2006 Linux Symposium, Volume One • 335 • clock synchronization 3.2 Clock Synchronization • time-of-day representation Crystal driven clock sources tend to be impre- • clock event management for scheduling cise due to variation in component tolerances next event interrupts and environmental factors, such as tempera- ture, resulting in slightly different clock tick • eliminating the assumption that timers are rates and thus, over time, different clock val- supported by periodic interrupts and ex- ues in different computers. The Network Time pressed in units of jiffies Protocol (NTP) and more recently GPS/GSM based synchronization mechanisms allow the correction of system time values and of clock These areas of concern are largely independent source drift with respect to a selected standard. and can thus be addressed more or less inde- Value correction is applied to the monotoni- pendently during implementation. However, cally increasing value of the hardware clock the important points of interaction among them source. This is an optional functionality as must be considered and supported in the overall it can only be applied when a suitable refer- design. ence time source is available. Support for clock synchronization is a separate component from those discussed here. There is work in progress 3.1 Clock Source Management to rework the current mechanism by John Stultz and Roman Zippel. An abstraction layer and associated API are required to establish a common code framework 3.3 Time-of-day Representation for managing various clock sources. Within this framework, each clock source is required to maintain a representation of time as a monoton- The monotonically increasing time value pro- ically increasing value. At this time, nanosec- vided by many hardware clock sources cannot onds are the favorite choice for the time value be set. The generic interface for time-of-day units of a clock source. This abstraction layer representation must thus compensate for drift allows the user to select among a range of avail- as an offset to the clock source value, and rep- able hardware devices supporting clock func- resent the time-of-day (calendar or wall clock tions when configuring the system and pro- time) as a function of the clock source value. vides necessary infrastructure. This infras- The drift offset and parameters to the func- tructure includes, for example, mathematical tion converting the clock source value to a wall helper functions to convert time values specific clock value can set by manual interaction or to each clock source, which depend on prop- under control of software for synchronization erties of each hardware device, into the com- with external time sources (e.g. NTP). mon human-oriented time units used by the framework, i.e. nanoseconds. The centraliza- It is important to note that the current Linux im- tion of this functionality allows the system to plementation of the time keeping component is share significantly more code across architec- the reverse of the proposed solution. The inter- tures. This abstraction is already addressed by nal time representation tracks the time-of-day John Stultz’s work on a Generic Time-of-day time fairly directly and derives the monotoni- subsystem [5]. cally increasing nanosecond time value from it. 336 • Hrtimers and Beyond: Transforming the Linux Time Subsystems This is a relic of software development history and representation was one of the main prob- and the GTOD/NTP work is already addressing lems faced when implementing high resolution this issue. timers and variable interval event scheduling. All attempts to reuse the cascading timer wheel turned out to be incomplete and inefficient for 3.4 Clock Event Management various reasons. This led to the implementation of the hrtimers (former ktimers) subsystem. It While clock sources provide read access to provides the base for precise timer scheduling the monotonically increasing time value, clock and is designed to be easily extensible for high event sources are used to schedule the next resolution timers. event interrupt(s). The next event is currently defined to be periodic, with its period defined at compile time. The setup and selection of the event sources for various event driven func- 4 hrtimers tionalities is hardwired into the architecture de- pendent code. This results in duplicated code The current approach to timer management in across all architectures and makes it extremely Linux does a good job of satisfying an ex- difficult to change the configuration of the sys- tremely wide range of requirements, but it can- tem to use event interrupt sources other than not provide the quality of service required in those already built into the architecture. An- some cases precisely because it must satisfy other implication of the current design is that such a wide range of requirements.

Load more