Local Error Recovery in SRM : Comparison of Two Approaches
Total Page:16
File Type:pdf, Size:1020Kb
Lo cal Error Recovery in SRM Comparison of Two Approaches ChingGung Liu Deb orah Estrin Scott Shenker and Lixia Zhang Memb ers use propagation delays to schedule their request Abstr act SRM is a framework for reliable multicast de livery In order to maximize the collab oration among the and reply timers each memb er detecting a loss waits for a group memb ers in error recovery b oth retransmission re random time p erio d b efore sending the retransmission re quests and replies are multicast to the entire group While quest and similarly each memb er receiving a retransmis SRM eectively uses random timers to suppress duplicate requests and replies the global nature of the request and sion request waits for a random time p erio d b efore sending replies means that every packet loss results in at least one the reply When memb ers receive a retransmission re request and reply message sent to the entire group quest or resp ectively a reply message while waiting to To further improve the scalability of SRM one must lo calize the scop e of error recovery trac In this pap er we send one of their own they cancel their scheduled trans present two approaches to lo cal recovery hopbased scop e mission This enables SRM to suppress duplicate requests control and use of lo cal recovery groups The rst approach and replies and thus avoid the request and reply message uses hop count to limit the distribution of requests and replies whereas the second approach connes error recov implosion problem ery trac using separatelyaddressed lo cal recovery groups However each packet loss will result in at least one re The lo cal recovery groups and hop count settings are au quest and one reply message b eing sent to the entire mul tomatically created and dynamically adjusted based on ob served loss patterns We use simulation exp eriments to ex ticast group This limits the scalability of SRM as network amine the p erformance of b oth approaches and group size increases As suggested in the premise of this pap er is that the error recovery mechanism should isolate error recovery trac to the required scop e Intro duction In this pap er we present two dierent mechanisms to lo calize the scop e of error recovery trac The hopscop ed Scalable Reliable Multicast SRM is a framework error recovery mechanism uses hop count to limit the dis for reliable multicast delivery it guarantees data deliv tance that request and reply messages can travel In con ery to all memb ers in a multicast session The mecha trast the groupscop ed error recovery mechanism connes nisms needed to achieve this reliability can b e decomp osed the propagation of error recovery trac by distributing it into two parts session message exchange and receiver to subsets of the group using separate multicast addresses initiated error recovery Memb ers p erio dically exchange Simulation results of these mechanisms suggest that they session messages to rep ort current group state eg the b oth reduce error recovery trac without intro ducing sig highest received sequence numb er from each source so that nicant overhead losses can b e detected and to determine the propagation Note that lo cal recovery is a p erformance optimization delays b etween each pair of memb ers The error re thus the mechanisms do not have to achieve the optimal or covery mechanism is receiverinitiated and NAKbased precise degree of lo cality the more lo cal the recovery the receivers are resp onsible for detecting data losses and re less recovery trac overhead there is In b oth mechanisms questing retransmissions These retransmission requests that we prop ose here a memb er may o ccasionally send and the resulting replies are multicast to the entire group its requests and replies to an inappropriate scop e While such mistakes have a slight impact on the p erformance ChingGung Liu is with the Fujitsu Lab oratories of America Inc e mail charleyafujitsucom of SRM in terms of the volume of error recovery trac Deb orah Estrin is with the Computer Science DepartmentInformation they have no impact on its correctness since all data losses Sciences Institute University of Southern California email es trinuscedu are eventually recovered Scott Shenker is with the Xerox Palo Alto Research Center email The pap er is organized as follows Sections and de shenkerparcxeroxcom Lixia Zhang is with the Computer Science Department University of scrib e the hopscop ed error recovery and the groupscop ed California Los Angeles email lixiacsuclaedu error recovery mechanisms resp ectively Section presents This research was supp orted in part by the Advanced Research Pro jects Agency monitored by Fort Huachuca under contracts DABTC the simulation mo dels and analyzes the simulation results and by the National Science Foundation under grant award No and Section reviews related work We conclude in Section NCR The views expressed here do not reect the p osition or p olicy of the US government with a short summary 1 Although the delay from a memb er and the delay to a memb er may b e dierent SRM assumes the path b etween a pair of memb ers are symmet 2 SRM assumes most or all session memb ers not only the data source ric thus each memb er determines its oneway delay to another memb er by save all the application data If some memb ers do not save the data taking half of its measured roundtrip delay This symmetry assumption requested they simply do not participate in the error recovery pro cess should cause no p erformance p enalty even when the paths are asymmet 3 ric b ecause the delay b etween memb ers is mainly used to dierentiate The discussion of lo calizing session messages is outside the scop e of memb ers in setting their retransmission timers this pap er More information can b e found in Request Hop Count HopScop ed Error Recovery Each requester simply sets its request hop count large The simplest way to control the scop e of requests and enough to reach at least one memb er that is closer to the replies is to limit the numb er of hops they travel We source This memb er do es not necessarily share the same wish to use the minimum hop counts p ossible in requests data delivery path with the requester all that matters is and replies To minimize the hop limit for request mes that the upstream memb er is closer to the source than sages our design takes the approach that a memb er ps re the requester Hence the hop count to reach an upstream quest extends just far enough to reach some other memb er memb er for a memb er p regarding a source s in a session q who is closer to the source If the loss o ccurred b etween s G d can b e set to p q and p then q will b e able to retransmit the lost packet If the loss o ccurred elsewhere so that q missed the packet s d minfh j q G h h g pq sq sp p as well then we only need to make sure that q will send a request further up towards the source All that matters where h is the distance in terms of the numb er of hops pq is that at least one request makes it across the lossy link from p to q ie link where the loss o ccurs and it is likely that this comes from the closest memb er b ehind the lossy request r ste h memb ers request timer is set prop ortional link since eac e qu source re requester to the measured delay from the source replier re requester In the original SRM design a request is used to suppress qu e st as well as to ask for repair While lim duplicate requests er iting the hop count of the request message limits the over head it generates it also diminishes its ability to suppress Figure Multicasting requests with a limited hop count other memb ers from sending the same requests Fortu reduces the eectiveness of request suppression Scenario nately the request hop count in our mechanism generally in the string top ology sp eaking is relatively small compared to the distance in terms of the numb er of hops to the source Thus the re Because a limited hop count reduces the eectiveness of quest overhead p er loss is acceptable even though multiple request suppression multiple requests regarding the same requests for retransmitting the same data are generated loss may b e generated In particular two scenarios illus Moreover b ecause request timers are based on the propa trated in Figure and are known to cause duplicate re gation delay from the source a memb er far b ehind the lossy quests In Figure the hop count to an immediate up link may receive a reply b efore sending its own request stream memb er is smaller than the hop count to an im While we greatly limit the scop e of requests we require mediate downstream memb er Therefore requests sent by that a reply have sucient scop e to reach all memb ers who upstream memb ers can not reach their downstream mem share the same loss Since a replier do es not know where b ers to suppress them from sending out the same requests a packet is dropp ed it is dicult for the replier to de In the worst case the request overhead within the loss re cide how far the retransmission must go However if a gion can b e two requests p er link one traveling upstream requester assumes it is immediately b ehind the lossy link and the other downward We consider such overhead ac it can determine as we show b elow an upp er b ound on the ceptable hop count needed to reach all other memb ers b ehind the requester p The upp er b ound is called the proxy hop count lossy link requester q the requester acts as request proxy for memb ers b ecause requester When a replier receives a request who share the same loss source replier s r requester h hops away and the