The State of Linux Power Management 2006
Total Page:16
File Type:pdf, Size:1020Kb
The State of Linux Power Management 2006 Patrick Mochel Intel Corporation [email protected] Abstract The total power consumption of a computer that we work with in this paper is measured simply by the power consumption of each of Power Management, as it relates to Operat- its components. (The amount of additional ing Systems, is the process of regulating the power needed to compensate for inefficiencies amount of power consumed by a computer. It is of power supplies and dissipation is not cov- a feature that is available in every type of com- ered.) In a computer in which the power con- puter that Linux runs, though the expectations sumption of each device remains constant over and the implementations of power management time, then total power consumption can be vary greatly depending the makeup of the plat- measured like this: form and the type of software running on it. This paper provides an analysis of power man- agement and how it relates to the Linux kernel. Cp = D1p + D2p + D3p + ... It provides a complete (though not comprehen- sive) survey of power management features, the policy to control those features, and user con- However, hardware features cause the amount straints of the policy. of power consumed by a device to fluctuate over time. So, the amount of power consumed by a device over a set amount of time (e.g. one hour) can be measured as the sum of power 1 Power Management Introduction consumed by the device at each interval (e.g. one millisecond) during that time. So, the equa- tion becomes a bit more involved for each de- Power management is the effort to minimize vice: the amount of power used over time, as mea- sured in watts (or fraction thereof). Technically, a watt is defined as one joule of D1p = (D1p0 + D1p1 + D1p2 ... D1pn) / n energy per second, but it is often used in refer- ence to the amount of power consumed in one hour of time. For example, the statement “This The rates of power consumption for a device computer has a 300W power supply” refers to are expressed in watts/hour, so the amount of the maximum amount of power consumed over power consumed per interval must be divided the course of an hour. [watt] by the number of intervals sampled. 152 • The State of Linux Power Management 2006 2 Device Power Management 2.2 Frequency modulation Each device in a computer has the potential to The frequency is a function of the voltage com- perform a certain of work over time: ing into the device and a set of multipliers (among other things). In some cases, the mul- Work = w0 + w1 + w2 + ... + wn tipliers can be adjusted to simply affect the speed of the device, though that doesn’t offer If the power consumption of a device is con- great savings in power because the device is stant, then it consumes the same amount of still drawing the same amount of current from power at each interval, regardless of whether the power source. But, by adjusting the volt- or not it is actually performing its potential age, the actual power draw changes, allowing maximum amount of work possible. Hardware for better power savings. power management features provide the abil- Technically speaking, the device will have a ity to adjust the amount of power that a de- range of frequencies that it can operate at vice consumes based on the amount of work for each voltage supplied. By adjusting the that the device needs to perform. The concept voltage, what is really being adjusted is the is based the basic observation that devices are min/max range of frequencies, given the multi- often underutilized—the amount of work de- pliers available. Within each voltage level, the manded from them may be less than the sup- multipliers can then be adjusted to achieve the ply that their potential offers. That general ob- desired frequency [Pentium M]. servation provides two more specific theories about work load and power consumption. The caveat of changing voltage levels as op- posed simply changing frequency levels is that • A device may operate at a slower rate or the voltage may also effect the multipliers that experience longer periods of idleness in are used to determine the bus speed between order to perform the work demanded of it. that device and another device (e.g. between the Please see Section 2.1. CPU and RAM). There is a higher latency in- volved in changing voltages because those val- • Part or all of a device may be disabled if ues must be changed and synchronized in lock- there is no work for it to do. Please see step. section 2.5. By being able to operate at different frequen- 2.1 Operable States cies at the same voltage level, and often the same frequency at different voltage levels, the power consumed can be minimized, but still Approaches to reduce power consumption of allow for low-latency transitions between fre- a device, yet all the device to continuously quencies. perform work are commonly implemented by CPUs, which have a near-constant demand for their resources. They accomplish this by doing 2.3 Linux support for frequency modula- one or both of the following: tion • Frequency modulation. Many modern CPUs support frequency modu- lation, and the Linux cpufreq subsystem has de- • Low-power idle states. veloped intelligent support for predicting and 2006 Linux Symposium, Volume Two • 153 Architecture CPUs device, but ensures that it will automatically x86/x86-64 Generic ACPI transition out of the low-power state when it re- AMD PowerNow ceives a request for work. This feature is most Intel SpeedStep Transmeta Crusoe famously implemented by x86 CPUs and speci- NatSemi Geode / Cyrix MediaGX fied by ACPI as the Processor C states: C0, C1, VIA Longhaul C2 . Cn [ACPI]. ARM Integrator SA1100 SA1110 Low-power CPU idle states are designed for PowerPC Various G3s & G3s use during the idle thread. How they imple- Sparc64 UltraSPARC IIe & III SuperH SH-3, SH-4 mented, and how they are used, is dependent on the platform. C1 on x86 CPUs may entered Table 1: Architectures Supported by cpufreq by simply executing the ’hlt’ instruction. Us- ing any low-power states beyond that requires very specific manipulation of the hardware. adjusting CPU load. cpufreq has support for ACPI provides an abstract interface for doing many CPUs across several architectures, as this on supported platforms, which is imple- seen in Table 1. Support for a specific CPU mented in the ACPI idle thread [ACPI Docs], requires a driver that understands the model- though other platforms must do it manually specific interface for determining and entering [DPM WP]. the supported states [cpufreq]. Other devices that experience continuous de- Each C state has a tradeoff between the power mand for resources could benefit from being it consumes and the latency for returning to C0. able to modulate their frequency depending on Depending on the demand for the CPU (as mea- the amount of demand. In particular, mem- sured by the amount of time spent idle), a lower ory and I/O buses are both large consumers or higher C state may entered, since a lightly- of power that could frequently experience un- loaded system can safely endure slightly higher der utilization of their resources. However, no latencies. known hardware features (let along Linux sup- port) exist for these items. This is not surpris- Low-power idle states are starting to be im- ing, as their software control would most likely plemented by other devices that are under fre- involve platform-specific commands and coor- quent demand, but also experiences frequent dination that is currently performed only by the periods of idleness. In particular, some PCI firmware (and beyond the knowledge and inter- Express chipsets have implemented low-power est of the kernel). idle states for its device links (connections be- tween the PCIe controller and downstream de- 2.4 Low-power idle states vices). These states are called L States, and may be used in two cases: when the down- stream device is active but idle (called Active A period of idleness is a small, sometimes State Power Management); or when the down- fixed, amount of time that a device is not be- stream device is in a low-power state (called ing used. Some devices may be able to trans- Link Power States). This feature is not yet parently enter a special low-power state during supported by Linux. More information can be this period. This state effectively disables the found in the References [PCIe PM]. 154 • The State of Linux Power Management 2006 2.5 Inoperable States A range was specified to allow hardware de- signers the ability to choose alternative imple- mentations in which more components of the Devices that are not needed for long periods of device lost power as a deeper power state was time may be put into an inoperable power state entered. This would allow a lower transition la- in which power to some or all of the device is tency from the low-power state to the D0 state removed. These states are typically used for pe- because an entire device reset would not be nec- ripheral devices that experience long periods of essary. However, few hardware designers have idleness or are known to be unneeded for a sig- found that tradeoff worthwhile as very devices nificant period of time (in a magnitude of sec- support D1 and D2. onds and higher). There are two basic classes of inoperable power states: where part of the de- An exception are graphics controllers, which vice is inoperable, and where the entire device are the biggest users of the intermediate power is inoperable.