Bufferbloat Mitigation in the Wifi Stack - Status and Next Steps

Bufferbloat mitigation in the WiFi stack - status and next steps Toke Høiland-Jørgensen Karlstad University [email protected] Abstract published elsewhere, most notably in our previous paper [2], and on the linux-wireless and make-wifi-fast mailing lists. Over the last year, patches for bufferbloat mitigation and airtime fairness have landed in the WiFi stack. The bufferbloat An important purpose of this paper and its associated talk patches improve performance across the board, and airtime is to solicit feedback from the community on the future di- fairness further improves performance in the presence of slow rections. This also means that several of the ideas for future stations. work are very much preliminary and subject to change. We provide an overview of these improvements and explain The rest of this paper first summarises some of the con- some of the peculiarities of WiFi that guided the design. Fol- straints imposed by the 802.11 protocol, then outlines the lowing this, we outline some of the proposed next steps. This previous improvements and their benefits, and finally outlines includes proposals for further reducing latency, a mechanism some ideas for future improvements. to set different policies for the airtime usage of stations, as well as a revamped queueing mechanism for dynamic QoS priority scheduling. 2 802.11 MAC protocol constraints There are three main constraints to take into account when 1 Introduction designing a queueing scheme for WiFi. First, we must be Linux has been the primary target platform for community able to handle aggregation; in 802.11, packets are assigned efforts to mitigate the unwanted network queueing latency different Traffic Identifiers (TIDs) (typically based on their known as bufferbloat. Numerous improvements have gone DiffServ markings [5]), and the standard specifies that aggre- into the Linux networking stack, such as the FQ-CoDel qdisc, gation be performed on a per-TID basis. Second, we must first introduced in Linux 3.5. However, the effects of apply- have enough data processed and ready to go when the hard- ing these improvements to a WiFi link has been limited until ware wins a transmit opportunity; there is not enough time to recently. This has changed dramatically over the last year, do a lot of processing at that time. Third, we must be able during which patches for bufferbloat mitigation and airtime to handle packets that are re-injected from the hardware after fairness have landed in the WiFi stack. The bufferbloat miti- a failed transmission; these must be re-transmitted ahead of gation patches reduce latency under load by an order of mag- other queued packets, as transmission can otherwise stall due nitude across the board, and an airtime fairness scheduler sig- to a full Block Acknowledgement Window. nificantly improves network performance in the presence of The need to support aggregation, in particular, has influ- slow stations. These changes are integrated directly into the enced the new queueing design in mac80211. A generic mac80211 subsystem and disable the qdisc layer entirely on packet queueing mechanism, such as that in the qdisc layer, WiFi interfaces, to be able to effectively deal with the con- does not have the protocol-specific knowledge to support the straints imposed by the 802.11 MAC protocol. splitting of packets into separate queues, as is required for In this paper we outline the properties of WiFi that have aggregation. And introducing an API to communicate this served as the basis for this design, and also describe the new knowledge to the qdisc layer would impose a large com- queueing structure itself and the performance benefits result- plexity cost on this layer, to the detriment of network inter- ing from it. In addition, we describe some of the current and faces that do not have the protocol-specific requirements. So planned features that are made possible because of the queue- rather than modifying the qdisc layer, our queue management ing structure. These include proposals for further reducing scheme bypasses it completely, and instead incorporates the queueing latency, as well as a mechanism to set different poli- smart queue management directly into mac80211. The main cies for the airtime usage of stations, and a revamped queue- drawback of doing this is, of course, a loss of flexibility. With ing mechanism for dynamic QoS priority scheduling. this design, there is no longer a way to turn off the smart The work presented here is the result of a community ef- queue management completely; and it does add some over- fort, and I can by no means take credit for all of it. This ex- head to the packet processing. However, the performance position is meant as a summary of work that has mostly been benefits by far outweigh the costs. Algorithm 1 802.11 queue management algorithm - enqueue. 1: function ENQUEUE(pkt, tid) Qdisc layer (bypassed) 2: if queue limit reached() then . Global limit FIND LONGEST QUEUE 3: drop queue () Assign TID 4: DROP(drop queue.head pkt) 5: queue HASH(pkt) 6: if queue.tid 6= NULL and queue.tid 6= tid then TID Split flows TID Split flows 7: queue tid.overflow queue . Hash collision 8: queue.tid tid TIMESTAMP MAC layer 8192 9: (pkt) . Used by CoDel at dequeue 8192 10: APPEND(pkt, queue) (Global limit) 11: if queue is not active then (Global limit) FQ- FQ- 12: LIST ADD(queue, tid.new queues) CoDel CoDel Retries 3 New queueing structure This section is based on the description and evaluation in our previous paper [2] describing these changes. Please see retry_q retry_q that for further details and a more thorough evaluation. The new WiFi queue management scheme is built on the Prio Prio principles of the FQ-CoDel qdisc. Because FQ-CoDel al- locates a number of sub-queues that are used for per-flow ath9k driver HW queue scheduling, simply assigning a full instance of FQ-CoDel to RR each TID is impractical. Instead, we modify FQ-CoDel to (x4) make it operate on a fixed total number of queues, and group queues based on which TID they are associated with. When a 2 aggr packet is hashed and assigned to a queue, that queue is in turn assigned to the TID the packet is destined for. In case that FIFO queue is already active and assigned to another TID (which means that a hash collision has occurred), the packet is in- 1 stead queued to a TID-specific overflow queue. To hardware The mac80211 stack does a lot of per-packet processing specific to the 802.11 protocol, which involves building head- ers, assigning sequence numbers, optional fragmentation, en- Figure 1: Our 802.11-specific queueing structure, as it looks cryption, etc. Some of these operations are sensitive to re- when applied to the Linux WiFi stack. ordering (notably, if sequence numbers or encryption IVs are out of order, packets will be discarded by the receiver). Since per-flow queueing can cause reordering when more than one The obvious way to handle the two other constraints men- flow is active, doing this packet processing before the queue- tioned above (keeping the hardware busy, and handling re- ing step would result in packet loss. On the other hand, ap- tries), is, respectively, to add a small queue of pre-processed plying everything at dequeue takes up valuable time in the aggregates, and to add a separate priority queue for packets loop that builds aggregates, which is time sensitive. To strike that need to be retried. And indeed, this is how the ath9k a balance between these factors, the mac80211 transmission driver already handled these issues, so we simply keep this. handling is split into two parts, with the latter part (containing The resulting queueing structure is depicted in Figure1. all reorder-sensitive actions) applied after the dequeue step, just before the packet is handed to the driver. 3.1 Airtime fairness A global queue size limit is kept, and when this is ex- ceeded, packets are dropped from the globally longest queue, Once we have the managed per-TID queues, it becomes which prevents a single flow from locking out other flows on straight forward to solve another long-standing WiFi perfor- overload. The enqueue logic is shown in Algorithm1. mance problem: The so-called ”performance anomaly”. This The lists of active queues are kept in a per-TID structure, anomaly is a well-known property of WiFi networks: if de- and when a TID needs to dequeue a packet, the FQ-CoDel vices on the network operate at different rates, the MAC pro- scheduler is applied to the TID-specific lists of active queues. tocol will ensure throughput fairness between them, mean- This is shown in Algorithm2. ing that all stations will effectively transmit at the lowest rate. The anomaly was first described in 2003, and sev- 1A hash collision can of course also mean that two flows assigned eral mitigation strategies have been proposed in the literature to the same TID end up in the same queue. In this case, no special (e.g., [3,6]). However, none of these have been implemented handling is needed, and the two flows will simply share a queue like in Linux until now. in any other hash-based fairness queueing scheme. Because we have a per-TID managed queueing struc- Algorithm 2 802.11 queue management algorithm - dequeue. Algorithm 3 Airtime fairness scheduler. The schedule function is 1: function DEQUEUE(tid) called on packet arrival and on transmission completion. 2: if tid.new queues is non-empty then 1: function SCHEDULE 3: queue LIST FIRST(tid.new queues) 2: while hardware queue is not full do 4: else if tid.old queues is non-empty then 3: if new stations is non-empty then 5: queue LIST FIRST(tid.old queues) 4: station LIST FIRST(new stations) 6: else 5: else if old stations is non-empty then 7: return NULL 6: station LIST FIRST(old stations) 8: if queue.deficit ≤ 0 then 7: else 9: queue.deficit queue.deficit + quantum 8: return 10: LIST MOVE(queue, tid.old queues) 9: deficit station.deficit[pkt.qoslvl] 11: restart 10: if deficit ≤ 0 then 12: pkt CODEL DEQUEUE(queue) 11: station.deficit[pkt.qoslvl] deficit + quantum 13: if pkt is NULL then .

Load more