Link Level Protocols Revisited

Phil Kam, KA9Q Brian Lloyd, WB6R QN

ABSTRACT

The LAPB protocol on which the connected mode of AX.25 is based was origi- nally designed for point-to-point wire links, not shared multiple-access radio channels. This paper discusses the deficiencies of LAPB in the radio environment and suggests several improvements. These) include the simple (adjusting existing TNC parameters), the moderate (upward compatible implementation changes) and the radical (replacing LAPB altogether with a simpler and inherently much more efficient protocol). We believe that these approaches deserve serious attention by the amateur packet community. The suggested AX.25 congestion control. techniques should be used as soon as possible in existing networks, while the new “ACK-ACK” protocol should be considered in the design of backbone networks and eventually user- network links.

1. AX.25 in Review (automatic request-repeat) protocols. It uses The Amateur Radio Link Layer Protocol receiver acknowledgements (“ACW’) along AX.25 [I] actually consists of two distinct sub- with sender timeouts and retransmission to layers. The upper sublayer is a connection- increase the reliability of the service provided oriented protocol almost identical to the Link to higher layer protocols above that of the raw Access Procedures Balanced (LAPB) from datagram sublayer. CCITT X.25 [2:1. Prepended to the LAPB con- The performance of an ARQ protocol trol field in AX.25, however, is a special depends heavily on how closelye its design address field containing ASCII-encoded ama- assumptions are met in practice. L(APB was teur radio callsigns and substation IDS (SSIDs). designed for a full-duplex, point-to-point chan- Each AX.25 frame contains, at a minimum, nel with low bit error rates; thus it performs the callsign/SSTD of the sender and intended poorly when used on a half-duplex, multiple- receiver of the frame. Optionally, it may also access radio channel with a high error rate. contain a sender-specified list of digital The remainder of this paper will discuss the repeaters, or digipeaters, through which the special requirements of the amateur environ- frame should be routed to its destination. ment, analyze LAPB with respect to these This additional lower sublayer contains requirements, and propose several alternatives. most of the changes made to enable it to func- These range from a set of backward-compatible tion in an amateur radio environment. An provisions to improve the LIAPB retransmission AX.25 address header always contains full algorithms in the presence of channel conges- source and destination addresses. Therefore it tion, to the specification of an entirely new, meets the definition of a “datagram” protocol, connectionless link level control protocol that and we will call this field the “datagram sub- is both easier to implement (especially in the layer” of AX.25. “multiple connect” case) and inherently much more efficient than LAPB. The LAP.B (upper) sub-layer of AX.25 belongs to th.e general class of “ARQ”

5.25 2. Packet Radio Protocol Requirements the receiver usually attempts to acknowledge A good packet radio protocol must do incoming data as soon a.s the channel becomes four things: clear, it is likely to collide with the sender if the sender attempts to send additional data 1 . Deliver the user’s data as reliably as pos- without first waiting for the ACK for the data sible. already sent. 1 Pipelining on a half duplex radio 2 . Deliver the user’s data as quickly as pos- channel is therefore a. good example of a sible. socially undesirable technique that attempts to 3 . Use the shared channel capacity as reduce user delay at the expense of overall efficiently as possible. channel efficiency. 4 . Be reasonably easy to implement. 3.2. Transmission Efficiency Naturally, these goals conflict. For exam- Given that a user wants to send a certain ple, on a full-duplex, point-to-point channel, amount of data, a packet radio controller must the third consideration does not apply. The control several parameters to packetize that protocol is free to use as much transmission data for transmission most efficiently. bandwidth as it chooses to maximize the other criteria, since it would otherwise go to waste. Since it operates on a half-duplex chan- On a shared packet radio channel, however, a nel, the controller must first decide how much socially well-adjusted protocol should attempt to send on each transmission before switching to use as little channel bandwidth as possible to to receive and waiting for an ACK. Sending move a given amount of data, even at the more on each transmission reduces channel expense of increased user delays. As we will overhead by allowing e’ach returning ACK to see, this makes several features of LAPB “cover” more data. It also increases the max- undesirable. imum throughput attainable on a channel with a given propagation dellay, since this rate can 3. Sliding Windows vs Stop-and-Wait never exceed one window full per round trip interval. On the other ‘hand, smaller transmis- LAPB is a sliding window protocol. This sions are smaller targets for noise and interfer- means that it is possible to send more than one ence. frame before an ACK is received for the first outstanding packet. The maximum number of Second, once it has decided to send a cer- frames that may be transmitted before the tain amount of data in a transmission, the data sender must stop and wait for an ACK is must be divided up into one or more consecu- known as the window size. For the standard tive frames. 2 The fewer frames used, the form of LAPB used in AX.25, the window size smaller the effects of frame overhead. On the cannot exceed 7. Most implementations allow other hand, a single large frame is unusable the user to decrease this limit further (e.g., the even if a single bit error occurs near the end, TAPR MAXFRAME option). so sending the data as a burst of smaller frames might allow the receiver to “salvage” at least 3.1. Pipelining the beginning of the transmission. (Note that the “go-back-N” recovery strategy of LAPB Sliding window protocols are popular requires that all frames received after an error because they allow the full utilization of high be discarded and retransmitted, even if the speed, furl duplex circuits with long round trip later frames are received correctly). delay. For example, a satellite path has a round trip delay of .5 seconds. If only one The sender therefore controls three frame can be sent at a time, the transmitter parameters: transmission size, frame window must remain idle at least .5 seconds before an 1 Note that TNCs capable of “multi-connect” ACK could return, enabling it to transmit operation (i.e., able to manage simultaneous connec- another frame. While the effect of this delay tions with several different stations on the same chan- can be reduced by making each frame very nel) should allow unacknowledged data on only one long it cannot be eliminated unless more than connection at a time. one f‘rame can be “in the pipe” at a time. 2 Although pipelining is to be avoided, more than one frame may be sent in a single transmission (i.e., Pipelining causes major problems, how- without interrupting carrier between frames). ever when the channel is half duplex. Since

5.26 size and maximum frame size, and setting any sliding window protocol is harder to evaluate. two determines the third. Unfortunately, However, it seems reasonable to argue that the values that improve performance in the pres- “salvage value” of a multi-frame transmission ence of noise conflict with those that minimize that has encountered a collision is likely to be overhead. Therefore, it follows that there even less than one corrupted by thermal noise. should be an optimal set of parameters for a This is because in CSMA, collisions tend to given channel when operating at a given error occur near the beginning of transmissions, dur- rate and with a protocol that has a given header ing the “collision window” represented by the and ACK overhead. time needed for one round trip propagation This is indeed the case. A program3 was delay. Since a hit in the first frame of a written to compute analytically the overall transmission renders all later frames unusable, expected efficiency of an AX.25 transmission, dividing up the transmission into multiple given the following parameters: frames simply increases header overhead without improving net performance. 1 . A Gaussian bit error rate (each bit error being independent of all others, as would Since we have shown that sliding win- occur with “white” thermal noise). dows are suboptimal for our application, and because they account for much of LAPB’s 2 . A transmission size in bits. complexity, the first of our motivations for 3 . The number of frames into which to scrapping LAPB altogether and designing a new divide the transmission. protocol better tailored to the radio environ- 4 . The overhead in bits of a frame header ment becomes apparent. However, we recom- and an ACK. mend that current AX.25 TNCs be operated in a stop-and-wait mode. Simply set MAX- For all bit error rates studied (lo-* to FRAME to 1 and take steps to adjust the 10 -‘, ranging from a link that is almost com- packet size to a value appropriate for the chan- pletely unusable to one so good that almost nel. This ensures that new data can never col- anything works), maximum overall efficiency lide with a returning ACK, regardless if the was always obtained when each transmission implementation already enforces a “single out- was limited to one frame. As expected, the standing transmission” rule. One implementa- optimum transmission (and frame) size tion strategy that might be useful here is to increases with d.ecreasing bit error rate. The hold additional outgoing data in a buffer until efficiency of a large (and suboptimal) transmis- any previously sent data has been ack- sion can be improved by dividing it up into nowledged, and then send as much as possible several consecutive frames. However, this is in the next data packet subject to the max- always less than the efficiency attained when imum packet size limit. This is much more the message is divided up into smaller single- efficient than “pre-packetizing” outgoing data frame transmissions, even when the extra ACK according to special characters in the data (e.g., overhead is taken into account. In other carriage return), since short lines would other- words, a “stop-and-wait” (i.e., window size wise result in small transmissions and poor one) protocol with an appropriate frame size throughput. A similar strategy, the Nagle Rule always uses the channel more efficiently than a [lo] has become widely accepted by implemen- sliding window protocol with a window size tors of the ARPA Transmission Control Proto- greater than one. col (TCP), the most common transport (level The “real world,” though, is a bit harder 4) protocol used in the ARPA Internet [6,7]. to characterize because many (if not most) One final issue, that of “Selective errors are caused by collisions rather than Reject ,” becomes moot when LAPB is insufficient SIN ratios. Errors are therefore operated in a stop-and-wait mode. If only a much more likely to occur in bursts, with single frame can be outstanding at any one entire transmissions often rendered useless. time, there is no need for this special meas- Collisions are harder to model than thermal ures. In any event, simulations have: shown noise, so their effect on the efficiency of a - that Selective Reject actually degrades perfor- 3 See Appendix 1 for the derivation of the formu- mance relative to “standard’:’ LAPB, especially las used in this program. on links with large window sizes, when errors occur in bursts (as they often do in the real world). [9] The same paper proposed an alter- this is not true for current AX.25 implementa- native known as “multi reject” but this is tions. again unnecessary if the window size is one frame. 4.1. Retransmission Backoff The most important change that must be 4. Congestion Collapse made to stabilize the A1X.25 protocol is for suc- Because LAPB made no provisions for cessive retransmissions caused by the non- channel sharing, as now implemented on ama- receipt of an ACK to be spaced out further and teur packet radio TNCs it is extremely prone to further over time. T’his technique is called channel “congestion collapse,” a stable condi- back08 and many variations exist. tion where virtually every transmission results The most well-known example is binary in a collision. Congestion collapse can be exponential backofi used in the Ethernet local avoided only when TNCs become able to adapt area network. [5] In Ethernet it is possible to themselves dynamically to the load offered by detect a collision during transmission. When other TNCs. This section proposes congestion this occurs, the transmission is aborted and a control algorithms that should be made part of backoff timer is set to a random value distri- the formal AX.25 protocol specification. They buted evenly between zero and a number that will be of little help unless widely imple- doubles after each unsuccessful attempt. In mented, because there is little incentive (other mathematical terms, th.e timer is set according than altruism) for one to use them unless to the expression everyone does. T = S random (0,2min(ny10)) Several factors combine to cause our The currently severe congestion problems. where random returns a random number first is jump-on, where several stations with evenly distributed between its two arguments, traffic to send collide with each other the n is the retry number, S is a proportionality instant the channel becomes free. The second constant, and the min function sets a max- and more serious problem is caused by hidden imum bound on the retransmission interval. terminals. These are stations unable to hear This causes successive attempts by each station (and defer to) transmissions from certain other participating in a collision (there could be stations. When this happens, the CSMA (car- many stations all jamming each other simul- rier sense multiple access) algorithm breaks taneously) to be spaced out over rapidly down and collisions become very frequent. increasing intervals until the offered load on Kleinrock [3] has shown that just one or two the channel drops to the point where successful hidden terminals in a packet radio network are transmissions can occur. As usual, should often enough to degrade performance below some retry limit (typically 16) be reached the that of slotted Aloha, and it doesn’t take many packet is abandoned. additional hidden stations for performance to approach that of pure Aloha. Gower and Jubin Base 2 gives reasonable performance and [4] observe that carrier sensing actually makes is easy to implement. However, other values things worse in the presence of hidden stations can be used in the exponential calculation. because it aggravates jump-on. Smaller bases back off less quickly, while larger bases cause rapid increases in the retransmis- A more fundamental problem with packet sion interval. If retransrnissions are more radio is that all dynamic multiple-access often caused by poor link margins than by col- schemes (regardless of carrier sensing, hidden lisions, a smaller basle (i.e., slower backoff) terminals, delay and so forth) exhibit a “nega- may be appropriate at the expense of slower tive resistance” characteristic at high load lev- adaptation to congestion. els. As the offered load increases, the collision rate increases and usable throughput decreases. Packet radio differs from Ethernet in that While the exact point at which optimum there is no immediate indication of a collision; throughput is obtained on a given channel one can be detected only by the non-receipt of varies widely depending on the algorithm and an ACK. Furthermore, the interval between the station characteristics, there is always the transmission of a packet and the receipt of an need to control the offered load dynamically if ACK will vary depending on external factors it is to be operated efficiently. Unfortunately, such as channel speeds and the number of digi-

5.28 peaters. Therefore, more work is needed to once the round trip timer is started, it should adapt our backoff strategy to packet radio. be allowed to run even if the packet being timed is lost and retransmitted. While this can 4.2. Retransmission Timing cause anomalously large “spikes” in the round In the most common implementations of trip time measurements, the alternative (res- AX.25 the estimated interval between tarting the timer when retransmitting) is worse transmission of a packet and reception of its because an ACK for a previous transmission of ACK is called FRACK and is sent manually. the packet might come back immediately after An ad-hoc rule is used to increase it according the retransmission. This would yield an to how many digipeaters are included in a con- erroneously small estimate that might never be nection. corrected. In any event, it is a good idea to process these measurements in two ways There are several problems with this before using them to set the retransmission approach. timer. 1 . A constant value cannot adapt to chang- 1 . Compute a running average over several ing channel conditions. A value that measurements to smooth out random gives efficient and quick response when fluctuations. the channel is lightly loaded may cause many unnecessary retransmissions when 2 . Include a “grace” factor to allow for sud- ACKs are slow in returning because of den increases in the round trip delay. channel load. The TCP specification recommends this 2 . Few users bother to tune this value to an formula for smoothed round trip time compu- appropriate value. If anything, users are tation: I more likely to set it too small because Ts = a T\ + (II-cv) T they get anxious when their TNCs don’t transmit when they “should”! where Ts’ is the newly computed Smoothed The presence of digipeaters is what makes Round Trip Time, TS is its previous value, T is this problem so difficult. There is no direct the round trip time encountered on an ACK way to tell how much channel loading exists in just received, and cy is a “tuning” constant sections of the network out of direct radio ranging between 0 and 1. Large values of cy range, and hence no way to predict in advance cause T, to react slowly to changes in meas- how long one should wait before assuming that ured round time, while small values cause it to a transmission was lost in a collision. When respond more quickly. The TCP spec recom- digipeaters are phased out in favor of direct mends an cy value between 0.8 and 0.9. Dave links between adjacent network nodes, a fixed Mills, W3HCF has shown [8] that using interval with a timer that runs only when the different values for cy depending on whether channel appears to be clear should work well. the delay is increasing or decreasing has merit In the meantime, however, we can borrow a in the Internet environment. His experiments technique from TCP and attack the static suggest using cy = 3/4 when T> TY and cy = FRACK problern on long digipeater routes by 15/16 when T< T, to give a “fast attack” and computing it automatically. To do this we “slow decay” response characteristic. measure the time between the transmission of Once 7” has been computed, TCP a given packet and the receipt of its ACK, specifies that it be multiplied by another tuning always making sure that the retransmission constant, p, to arrive at the final retransmission timer is greater than this period. A “round timer value. The recommended value for p is trip timer” is created, providing continuous 2.0, i.e., the transmitter waits two round trip packet-by-packet measurements that take into times for an ACK before retransmitting. The account changing channel and digipeater load- TCP specification also recommends that ing. repeated unsuccessful transmissions be spaced Measured round trip times are only edu- out, although the exact back’off algorithm is not cated guesses about future behavior, because specified. the path is statistical in nature (e.g., someone For AX.25 use, we can combine the might unpredictably start a large file transfer round trip timer from TCP with the binary through a remote digipeater). Note also that exponential backoff algorithm from Ethernet

!5‘2 9 and use them together to set the retransmis- transmission attempts once the channel goes sion timer. To do this, we first observe that we clear. Assuming only lone station decides to should always wait at least one round trip inter- transmit in each slot, or round trip interval, the val (preferably p intervals, to allow a grace other stations on the channel will hear and interval as in TCP) for an ACK to come back. defer to it before the;y attempt to transmit One possible result is the following: themselves. T= 7; p random(l,2”) Clearly there is an optimum value of p for a given level of channel activity. If p is too When packets are not being lost, the round trip large, too many stations will jump on each timer will always be set to p times the other in the first slot. If p is too small, many smoothed round trip interval. Should several slots will go to waste be:fore the stations even- retransmissions occur in a row, however, both tually transmit, and transmission delays will the average retransmission interval and vari- increase unnecessarily. It is therefore best to ance will double on each retry. set p dynamically, if possible. At low load levels, p=l gives the best performance; as the 4.3. Persistence channel reaches 75-80°/ci offered load (i.e., the channel appears busy 75-800/o of the time with The round trip timing and backoff algo- either good packets or collisions) the value rithms just described help stabilize the channel should be decreased. and prevent congestion collapse through nega- tive feedback that attempts to operate the The backoff algorithm given in the previ- channel near the peak of its throughput ous section can be viewed as a way to adjust efficiency curve. These are especially helpful the value of p dynamically when timeouts when many hidden terminals are present, limit- occur, since doubling the retransmission inter- ing the maximum attainable channel efficiency. val corresponds to halving the value of p. The Even without hidden terminals, however, only real difference is tihat the backoff interval efficiency can still suffer greatly in a heavily is evenly distributed while the persistence loaded network because of the jump-on prob- “timer” is exponentially distributed. lem. TAPR TNCs pioneered the use of One way to alleviate this is to change how DWAIT, a persistence-like feature. It should stations with packets to send behave when they be enhanced to provide true persistence (i.e., find an idle channel. Our current algorithm, by randomization). A worthwhile research pro- namely “transmit as soon as you hear the ject would be the development of a good algo- channel go clear,” is formally called l-persistent rithm for the real-time evaluation of p, with CS’MA and leads to the jump-on problem. A the goal of incorporating it into a future revi- less greedy alternative is p-persistent CSMA, sion of the AX.25 specification. One possible where each station waits a random amount of approach is to sample the state of the DCD time before transmitting even when the chan- line periodically and estimate the channel nel seems to be clear. Each station generates activity. The value of p could then be found an evenly distributed random number between in a lookup table. If more retransmissions 0 and 1 and compares it to a constant, p, which occur than expected for the measured amount also ranges between 0 and 1. If the random of traffic (e.g., due to hidden terminals), then number is less than p, the station transmits; p could modified as appropriate. otherwise it waits a small amount of time (called the slot time, ideally one round-trip pro- 5. Some Final ThoughIts on Congestion Con- pagation delay) and repeats the procedure. As trol p approaches zero, channel efficiency theoreti- Congestion control in packet radio net- cally approaches 100%. Unfortunately, how- works and packet switching networks in general ever, delay also approaches infinity! For any are notoriously difficult problems and have non-zero value of p, however, the packet will received much research attention. The sugges- eventually be sent after finite delay although tions made here, however, should greatly channel efficiency will decrease. alleviate the problem and make our AX.25 Persistence works because the stations digipeater networks halfway usable under heavy waiting for the channel will “spread out” their load.

5.30 However, it may turn out that radically 6. A Replacement for LAPIB new approaches to transmitting bits and packets As we have seen, LAPB is out of its over amateur radio links will hold a better solu- natural element (a reliable, full duplex point- tion. to-point link) when it is used on a lossy, half 1 . Spread spectrum with different spreading duplex packet ra.dio channel. LAPB lacks the sequences iassigned to each receiver elim- features needed to control congestion and inates channel collisions except for the improve performance on bad links. It also con- less likely case where two transmitters tains unnecessaryI “features” that increase the attempt to send to the same receiver at size of an implementation and may actually the same time. degrade performance. This section is an 2 . Since the received signal levels in a attempt to apply the lessons learned from packet radii0 network usually vary widely, LAPB to the design of a new protocol, one modulation methods and forward-error- much better suited to the amateur radio correcting (FEC) techniques that exhibit environment. higher degrees of “capture” may help by minimizing collisions where nobody 6.1. Link Level Acks versus Connections “wins.” An important point needs to be stressed 3 . Busy Tone Multiple Access (BTMA) [3] here. Link level ackrzoM’/eC,!,~Ct77e~ltS (desirable involves receivers indicating on a separate for performance reasons when operating on a radio channel (with a “busy tone”) that lossy channel) do not necessarilv imply link they are actively receiving a packet. Sta- level cm~~ectiom. A packet switch operating in tions monitor the busy tone rather than a large metropolitan area m;jy serve a total of the data channel to determine when to several hundred users, but only a sm:lll fraction defer. If the power levels and propaga- may be active at any one time. Why require tion conditions between the main and the switch to maintain hundreds of mostly idle busy-tone (channels are symmetric, it is link level protocol control blocks for all these possible with BTMA to avoid completely users, or alternatively saddle users with the the hidden terminal problem. While this burden of setting up a link level “connection” technique requires twice as many radios to their local switch before they may communi- (along with the ability to operate in full cate with their final destinations’! duplex, although this could be crossband) Veterans of the amateur packet radio the improvement in throughput could “protocol wars” will recognize this as an argu- easily more than double. ment for datagram-oriented networks. We 4 . Conventional “bent-pipe” analog or believe that the connection- (or virtu;il-circuit-) regenerative digital repeaters of the split- oriented nature of LAPB is another major con- frequency type also reveal hidden termi- tributor to its complexity. We further believe nals. While this is contrary to the trend that a redesigned protocol that enhances link toward digipeaters, they may be the most level reliability with acknowledgements ~~i~/~~~t expedient way to solve a particularly explicit connection management procedures difficult case. will be both simpler and easier to implement. This list only touches the list of possible 6.2. Window Control approaches to congestion control. Above all, we need to leave plenty of room for experi- As shown by our analvsis- of LAPB, slid- mentation into these and other “low level” ing windows are unnecessary in a half duplex problems. Our experience has shown that packet radio environment, so our replacement there cannot be a single, “standard” link level is a simple stop-and-wait protocol. Even when protocol that is optimum everywhere. Each traffic is pending to more than one station, link protocol mu.st be customized to a specific only one data packet may be sent at a time. environment, and the network architecture An ACK must be received for this packet (or a (e.g., higher level protocols) must take this “give up” interval exceeded) before another fact of life into account and be flexible enough packet can be sent to this or anv other station. to accommodate them all. This avoids the ACK collisions tha.t could occur if we were to sent traflic to two or more

5.3 1 stations at one time. This also simplifies the ity of a data packet or ACK being successfully implementation, since all traffic to be sent on a received in one try is pG/ or pa, respectively. In given channel can be kept on a single queue order for the sender to expect to receive one regardless of destination. ACK, the receiver will, on the average, have to 1 transmit its ACK - times. In order for the 6.3. Acknowledgement Piggyback Pa receiver to transmit this many ACKs, however, Another feature of LAPB that makes its 1 contribution to complexity is the ability to it will also have to relceive - data packets. Pa “piggyback” an ACK on a data frame traveling Therefore, the sender can expect to send in the reverse direction. This is a difficult 1 1 feature to use in practice, because there is sel- -X-packets for ea.ch ACK received. For Pd Pa dom a data frame available at precisely the example, if the probability of a data packet or right time on which to piggyback an ACK. ACK being successfully received is 25%, then Most applications operate in a “pseudo half Pd = Pa = 0.25, and the sender will have to duplex” mode: one host sends one or more send each data packet an average of sixteen packets to its peer, and one or more packets times before an ACK is finally received and it later flow in the opposite direction. Some pro- can continue with the next data packet! On tocol implementations delay the transmission such a link, a greater attempt by the receiver to of ACKs in the hope that a higher layer will ensure receipt of its ACKs by the sender is soon generate an outgoing data frame on which worthwhile. it could be piggybacked. This seldom succeeds, however, because any delay must be kept short Given that nothing can be done about the both to prevent unnecessary retransmission and “raw” value Of pay then the only thing the to keep the round trip time small enough to receiver can do to increase reliability is to avoid affecting throughput. We therefore felt retransmit ACKs whenever necessary. If the that ACK piggybacking was not necessary in receiver were to retransmit each ACK up to N our protocol. times, then the probability that at least one attempt succeeds is 6.4. Acknowledgment Retransmission 1 - (ll-pa)N In LAPB, a lost ACK causes the sender to timeout and retransmit a data frame, even If the probability that a data packet is success- though it has already been received correctly. fully received is pd, then the expected number When loss rates are low, this is not a serious of data packet transmissions that result when problem. However, should a substantial frac- the receiver makes N alttempts to transmit each tion of the ACKs be lost, the unnecessary ACK is retransmission of data frames by the sender 1 1 merely to elicit ACK retransmissions from the -Pd ’ I-(l-pa)* receiver can cause a considerable loss of performance. This problem was first recognized For our earlier example where &j = pa = 0.25, eleven years ago by the designers of the first if N=5 then the probability becomes .76 that at packet radio network, the ALOHANET, who least one ACK will ma.ke it back to the sender. proposed an elegant solution that we have 4 This decreases the expected number of data dubbed the ACK-ACK protocol Ill]. packet transmissions from 16 to 5.25, a sub- As its name implies, the ACK-ACK pro- stantial improvement. tocol provides for the acknowledgement and The receiver does this by setting a timer retransmission, if necessary, of ACKs to when it transmits its ACK and retransmitting it prevent the unnecessary retransmission of data if an acknowledgement of its acknowledgement packets. Let’s look at an example using LAPB (an “ACK-ACK”) does not return before the more closely to show why ACK-ACK gives timer expires. The receiver may also accept better performance. Assume that the probabil- the next data packe:t from the sender as confirmation that its ACK of the preceding one 4 We considered calling this the “Bill the Cat Pro- tocol,” but thought the reference too obscure. was accepted, because the sender will not continue with the next data packet until the one

5.32 outstanding has been acknowledged (remember request an ACK). There: is of course the this protocol operates in stop-and-wait mode). chance that we might delay processing of the This means that when one station sends a new data so long that a tirneout occurs. This steady stream of traffic to another, no addi- can be largely avoided, however, if the link tional packets over the conventional protocol retransmission timers are chosen to be longer case are sent. Only at the end of a burst of than the time needed to send a typical multi- traffic (or after an isolated packet) is an extra packet “burst.” These can be limited by ACK-ACK packet generated. Clearly, the appropriate window sizes in the end-to-end ACK retransmission timer must be shorter (transport) protocol. The alternative, immedi- than the data retransmission timer; this ratio ately acknowledging the new data but telling corresponds to the value of N in the above the sender to hold off on more, is possible but equations, the number of times the receiver complicates the protocol. Periodic “probes” will attempt to retransmit the ACK before the from the other station would be needed to data packet is retransmitted unnecessarily. guard against the deadlock that would occur if To make the protocol reliable, we need to our “go ahead” message is lost, and these include a frame ident/Jit)r (ID) with each could also collide with packets to or from the transmitted ‘data frame and ACK. The ID station we’re already communicating with. The ensures that the sender and receiver do not get simple acknowledgement delay strategy thus out of step, possibly losing or duplicating a data seems workable, especially if the data frame. The ID need not be a sequential value; retransmission strategy backs off rapidly (e.g., it need only be different from one frame to the exponentially). It also simplifies the code next. If the receiver receives a data frame with greatly, since managing multiple connection the same source address and ID as the last state control blocks is usually the hardest part received frame, a normal ACK is sent but the of a “multi connect” TNC. frame is otherwise ignored as a duplicate. Since radio channels can vary widely in The ACK-ACK protocol can be viewed as quality, it is desirable to make the use of establishing an implicit, unidirectional “con- ACK-ACKs and even ordinary ACKs optional. nection” with the first data frame, which exists This is especially useful when supporting only as long as there is data to send. Once an datagram-oriented network layer protocols such ACK-ACK is sent to show that no more data is as IP, since it is more elricient to dispense available, the sender and receiver need retain entirely with the overhead of link level ACKs no further state (i.e., addresses and ID) and and rely on end-to-end retransmission of lost the “connection” is torn down. The sender packets when link error rates are very low. then repeats the: process with any other station Our protocol therefore provides for an indica- for which it has traffic. tion in each packet (data or ACK) whether a reply (ACK or ACK-ACK) is expected for this Only one implicit “connection” exists at transmission. It is therefore easy to adapt this any moment (although the destination to which protocol to changing radio conditions or user we are “connected” can change very rapidly) reliability requirements. An ACK-ACK need since our half duplex operating rules require be sent only when no more data is available, that only one data packet be outstanding at a since the next data packet awaiting transmis- time. If a thirld station sends us data while sion may serve as an ACK-ACK. An ACK- we’re already exchanging data with another sta- ACK may then be represented merely by an tion, then this new data is put on a queue and empty data packet with the “acknowledgement processed after we’ve completed our current requested” bit turned off. This simplifies both transfer sequence. If a data frame requesting the concept and the implementation of the pro- an ACK should arrive from a new station, we tocol considerably. do not immediately acknowledge it because that would encourage it to send more data. 6.5. ACK-ACK Preliminary Specification This might interfere with the station we’re already communicating with, so we hold this In specifying frame formats correspond- frame on a queue for processing once we ing to data, ACK and ACK-ACK messages, we return to an idle state. (However, an ACK have tried to adhere to the basic structure of from such a station may be processed immedi- the AX.25 datagram sublayer. In other words, ately, as may incoming data that does not we keep the standard AX.25 address field lay-

q5.33 outs and attempt to use the existing control length. flag definitions whenever possible. Data is FCS Frame check sequence, length 16 bits. transmitted as a UI frame with poll and com- This contains the CRC-CCITT checksum mand turned on, ACK is transmitted as a UA of the frame. frame with final and response turned on, and ACK-ACK is transmitted as an empty (no F Trailing flag. data in the data field) UI frame with poll The ID byte is probably best set from a turned off and command turned on. single counter used for all transmissions regardless of destination. The only require- 6.5.1. Frame Formats ment placed on the sender is that the value of the ID field not be duplicated in any two sub- The frame format is similar to that of sequent frames sent to the same destination; AX.25. The fields are named and the length of this can be avoided by limiting the transmit the field in bits follows the field name and is queue to 255 entries. enclosed in parenthesis. Here is the basic frame: 6.5.2. Flow of Data Between Stations F Dest Src Digi Ctl PID ID Data FCS F This section is a preliminary specification (8) (56) (56) (O-448) (8) (8) (8) (16) (8) of the ACK-ACK protocol in an informal, nar- rative style. As this definition is refined and Where implemented, a more formal description is F Flag, length 8 bits. Value 7E hex, not being written. Contact the authors for more bit stuffed. information. Dest Address of destination, length 56 bits. Data is transferred from station A to B in This is the standard AX.25 address/SSID the following way. Station A selects a ID combination. value, sends the data (UI) frame to B, and Src Address of source, length 56 bits. This starts a timer with interval rd. B notes the is the standard AX.25 address/SSID sender address and ID number, responds with combination. an ACK (UA) containing the ID just received, and starts a timer with interval To. Digi Address of repeater(s), length varies After receiving an ACK. from B for the most from 0 (no digipeaters) to 448 (8 recently sent frame, A responds by sending digipeaters). This is carried over from the next data frame and restarting its timer if AX.25, although deprecated for the net- there is more data to send, or sending an works in which this protocol is likely to ACK-ACK (empty data frame without poll be used. request) and stopping its timer if there is no Ctl Control field, length 8 bits. The con- more data to send. When B receives the next tent determines the frame type, either data frame or an ACK-ACK, B discards any ID UI or UA. The following values are in it is currently retaining from A and either binary: sends an ACK with the new ID (restarting its UI 000p0011 timer) or goes into the idle state (stopping its timer) as appropriate. UA OllpOOll where ‘p’ is the poll/final bit. If A should fail to receive an ACK containing the correct ID from B within the PID Protocol identifier, length 8 bits. This timeout interval Td, then it increments a retry identifies the layer three protocol and counter, retransmits the: data frame and restarts follows the same conventions as AX.25. its timer. If the retry counter exceeds a fixed ID Frame ID, length 8 bits. This uniquely limit, additional transmission attempts for this identifies the frame from the previous frame are aborted, and if possible, the packet and subsequent frames to avoid duplicate should be returned to its original source with and lost frames. an error indication. Higher level protocols are Data Data, length 0 to 523,688 bits (0 to then responsible for any further error recovery 65,461 bytes). The sender and receiver procedures. If B fails to receive either another should agree on a maximum frame data frame or an ACK-ACK from A within its length not to exceed 65,461 bytes in timeout period To, then it resends its ACK, 5.34 restarts its timer and increments a retry that relates round trip time to the: selected counter. If the same data frame is received value of 7;1. Both timers should be random- again from A, B resets its retry counter, ized and backed off as described earlier in the resends an ACK and restarts its timer but oth- congestion control section. erwise ignores it as a duplicate. If there is no As soon as 7;, (equal to N’,‘,) expires at response at all from A after N ACK the receiver the receiver should cease sending T retransmissions, where N is the ratio /I, B ACKs. This prevent the ACKs from colliding T with and destroying a retransmitted data frame. abandons any further retransmission at teQmpts If the sender should fail to receive an ACK and acts as thlough an ACK-ACK had been it will resend the frame. The receiver repeats received (i.e., it discards the ID and sender the process of acknowledgement but discards information and returns to a quiescent state). the frame. Higher level protocols are responsible for detecting and recovering from the residual pos- 7.1. Performance sibility for packet duplication this allows. ACK-ACK was compared to AX.25 for Because of its small amount of state, this various links and message sizes using a protocol can be implemented simply and with Monte Carlo program that simulates both little memory. Communicating with several ACK-ACK and AX.25 in identical environ- different stations in “rapid fire” sequence is ments. In all configurations a window size of 1 easy; the receiver need retain the last ID (stop and wait) was found ‘to be most efficient valued received from a particular sender only and ACK-ACK to be more eficient than as long as data1 is actively being transferred. LAPB/AX.25. Performance improvements Special connection establishment packets are over LAPB/AX.25 ranged from a low of 1.21 unnecessary. for very reliable links to a situation involving 2 digipeaters where ACK-ACK delivered the 7. Selection of Appropriate Values for Tun- data and AX.25 failed to deliver all the data ing Parameters within the running time of the simulation. In There are three parameters that should the latter case, when the simulation was be regularly adjusted or “tuned” to the terminated the perf’ormance improvement existing conditions on the channel. These over AX.25 was already over 100. parameters are frame size, transmit timeout 7.2. Conclusion timer 7;1, and ACK timeout timer Tu. The frame size should be adjusted to the LAPB/AX.25 is a good protocol for optimum value based on the current channel point-to-point communication over reliable statistics. As more frames are lost the frame full-duplex links. ACK-ACK is more desirable size should be shortened until the optimum than AX.25 in high error rate and/or half- operating point is reached as described in duplex environments because it provides Appendix 1. significantly greater throughput, better chan- The correct timer values are much nel utilization, and is much simpler to imple- simpler to calculate. The transmit timer should ment. be set to some value that is significantly greater than the round-trip time for a frame to 8. References travel from sender to receiver for an ACK to [II Fox, T., ed “AX.25 Amateur Radio return. The receiver should also calculate the Link Layer Protocol Version 2”, Ameri- round-trip time and determine how many can Radio Relay League, 1985. ACKs are required to insure that an ACK [2] CCITT Study Group VII, “Interface reaches the sender. The receiver should set the Between Data Terminal Equipment value of To to be 7;1 divided by the number (DTE) and Data Circuit-Terminating of ACK transmissions N necessary to increase Equipment (DCE) For Terminals Operat- the probability that an ACK arrives at the ing in the Packet Mode on Public Data sender to the desired level. In order for this Networks,” Recommendation X.25, technique to work properly both the sender 1984. and receiver need to agree on the formula

5.3 5 [31 Kleinrock, L., and Tobagi, F., “Random Header Overhead Access Techniques for Data Transmission If there are D data bits in a transmission, over Packet-Switched Radio Channels,” H frame header bits, and N frames in a National Computer Conference, 1975. transmission, then the ratio of data bits to total ~~187-201. bits sent is Gower, N., and Jubin, J., “Congestion --L) Control Using Pacing in a Packet Radio D+NH Network,” IEEE Conference on Military Communications, 1982. If we include the overhead of an ACK (A bits) Metcalfe, R. M., and Boggs, D. R., “Eth- then this becomes ernet: Distributed Packet Switching for --D Local Computer Networks,” Commun. D+NH+A ACM, vol. 19, pp. 395-404, July 1976. [61 Postel, J., ed., “Transmission Control Protocol Specification,” ARPA RFC 793, Unusable Frames September 1981. If the channel bit errors occur indepen- [71 US Department of Defense, “Military dently at rate R, (i.e., caused by Gaussian Standard Transmission Control Proto- sources such as “white” thermal noise) then col,” MIL-STD-1778, 12 August 1983. the probability that a frame of length D+H is received without errors is simply (1-R) n+rr. Mills, D., “Internet Delay Experiments,” [81 In this section we call this quantity G, the pro- ARPA RFC 889. bability of receiving a good frame. Note that [91 Brady, Paul T., “Performance Evaluation G decreases as we increase D; i.e., the longer of Multi-Reject and Selective Reject the frame, the greater the chance it will be cor- Link-Level Protocol Enhancements,” Bell rupted in transmission, unless the channel has Communications Research. (Paper sub- a perfect bit error rate (R=O). mitted to ICC Toronto, June 1986). In the go-back-N error recovery tech- [lo] Nagle, J., “Congestion Control in nique, only the next expected frame can be IP/TCP Internetworks,” ARPA RFC processed; any other frames received are dis- 896 carded, even if they are received correctly (i.e., [l I] Binder, R., et al., “ALOHA Packet with a valid CRC). Broadcasting - A Retrospect,” Proceed- Consider this example. Four frames ings of the 1975 National Computer numbered 1, 2, 3 and 4 are sent in one Conference, page 208-209. transmission. Frame 11 is received normally, but frame number 2 is corrupted. Even if Appendix 1: Efficiency of a Go-Back-N Pro- frames 3 and 4 are received correctly, the tocol receiver will only acknowledge frame number Two interrelated factors must be taken 1. Frames 2, 3 and 4 will have to be resent, into consideration when evaluating the even though only frame 2 was actually lost. efficiency of a go-back-N sliding window proto- To analyze the performance of a sliding- col on a noisy channel: the effect of header window protocol with go-back-N recovery, we overhead (limiting efficiency even on an error- need to find the expected number of usable free channel) and the effect of garbled frames. (with correct CRC and in correct sequence) This section derives the formulas for these two frames in a transmission containing N frames. factors that were used to arrive at the conclu- We then divide this number by N, yielding the sion that a window size of 1 (i.e., a stop-and- normalized efficiency of the protocol (the wait protocol) maximizes channel efficiency number of usable frames received divided by when operated on a half duplex channel. To the total number sent). G is defined as the determine the net efficiency of the channel, probability of any specific frame within the these two factors are evaluated with the desired transmission being received correctly, and we parameter values and then multiplied. assume that this value is the same for all frames (i.e., the frames are all the same size and errors are independent). Clearly, if we

5.36 were able to use every correctly received frame (i.e., if we did not discard frames received out-of-sequence due to the loss of an earlier frame) then this efficiency would simply be G, for any value of N. In the go-back-N case, however, the a.nalysis is more complicated. The probability that zero frames are usable is (l-G), i.e., the probability that the first frame is in error, rendering it and all late] frames unusable. The probability that only one frame is usable is G( l-G), the probability that the first frame is received correctly and the second one in error, and so on up to the probability that all N frames are usable. Summing over all possi.bilities and weighting by the: number of usable frames for each, we obtain N-J (1-G) c iGi + NGN i=O

By factoring out G from the series and then integrating and differentiating the finite sum, we obtain

(1-G) G-$$“$ iG’-’ dG + NGN i=O

Integrating,

Cl-G)G--&yGi + NGN i=O

This now contains the sum of a finite geometric series. Substituting with the closed form of the sum, we obtain

(I-G:)G-$s + NGN

Differentiating, (1-G) G -I-GN-NG”-’ + NG” I (1-G>2 1-GI Simplifying, Gl-G” 1-G

And normalizing by N, the number of frames sent in each transmission, we finally obtain 1-G.’ GN(l-G)

A quick check shows that substituting N=l for the single-frame case yields G, as expected.

5.37