
-- -- PATHALIAS or The Care and Feeding of Relative Addresses Peter Honeyman Computer Science Department Princeton University Princeton, New Jersey 08544 princeton!honey Steven M. Bellovin* AT&T Bell Laboratories Murray Hill, New Jersey 07974 ulysses!smb ABSTRACT Pathalias computes electronic mail routes in environments that mix explicit and implicit routing, as well as syntax styles. We describe the history of pathalias, its algorithms and data structures, and our design decisions and compromises. Pathalias is guided by a simple philosophy: get the mail through, reliably and efficiently. We discuss the principles of routing in heterogeneous environments necessary to make this philosophy a reality. HISTORY AND OVERVIEW UUCP,1 the basic networking component of UNIX,2 is the backbone of a widespread store-and-forward net- work. Because setting up new connections is easy, and does not require the intervention of a central administrator, the network has no regular topology. Mail routing is explicitly specified by users. That is, a user who wishes to send mail to hostb using hosta as a relay would write mail hosta!hostb!user When the UUCP network was small and the average connectivity was high, explicit routing was a minor annoyance at worst. Most paths were direct, and only a tiny fraction involved more than one or two hops, so remembering proper paths was easy. Then came USENET.3 For several reasons, UUCP routes soon became a major headache. First, many of the universities on USENET had a low degree of connectivity to other UNIX sites, typically with only two or three long-distance links. Second, USENET readers tended to reply along the USENET paths; these were rarely optimal, and were sometimes unusable. Third, as other networks were used for USENET transport, hhhhhhhhhhhhhhhhhh * Much of the work was performed at the Department of Computer Science, University of North Carolina at Chapel Hill. 1 D.A. Nowitz and M.E. Lesk, ‘‘A Dial-Up Network of UNIX Systems,’’ in UNIX Programmer’s Manual, Seventh Ed., 1979. 2 UNIX is a trademark of AT&T Bell Laboratories. 3 S.M. Bellovin and M. Horton, ‘‘USENET — A Distributed, Decentralized News System,’’ unpublished manuscript, 1986. -1- -- -- mail reply syntax became complicated by the variety of standards in use. As USENET grew, it became clear that the UUCP network needed a routing tool, one that took as input a network connectivity graph and generated usable paths to every known destination. Pathalias is such a tool. Given a description of the connectivity of the UUCP network, it produces a ‘‘least cost’’ path to every known site. Of course, any such effort relies heavily on the quality of the connectivity data. At first, gathering such data was a difficult administrative problem. Very few system administrators were willing to spend time compiling lists of neighbors and associated cost data. Some connections could be inferred from USENET maps, but these data were unreliable and lacked cost estimates. Worse, they tended to understate the con- .. nectivity of the network, putting more load on cooperative sites. Because the data were often contradictory and error-filled, it was necessary to inspect and edit the data manually. Thanks to the USENIX Association’s UUCP-mapping project,4 the picture is much brighter today, with timely and accurate data widely available on USENET. INPUT The input to pathalias is a description of a directed graph that models the connection topology of electronic networks. Each edge is assigned a non-negative cost value and a ‘‘routing operator’’; the latter shows what character is used for mail-routing, and whether it appears to the right or left of the destination host. For example, the graph 10 a b 20 c is described as follows, assuming that host a uses the UUCP syntax convention of host!user for net- work mail: a b(10), c(20) If host a uses the ARPANET syntax of user@host, the description is a @b(10), @c(20) The @ itself indicates the character used for building an address; its position indicates that the host name is on the right. Thus, the default case may be written explicitly: a b!(10), c!(20) Many sets of hosts are fully connected, i.e., every host in the set talks to every other. To avoid the neces- sity of explicitly listing every connection, pathalias supports a network notation. Thus, dopey grumpy(10), sleepy(10) grumpy dopey(10), sleepy(10) sleepy grumpy(10), dopey(10) can be written as UNC-dwarf = {dopey, grumpy, sleepy}(10) hhhhhhhhhhhhhhhhhh 4 M.R. Horton, K. Summers-Horton, and B. Kercheval, ‘‘Proposal for a UUCP/USENET Registry Host,’’ in Proc. Summer USENIX Conference, Salt Lake City, 1984. -2- -- -- where UNC-dwarf is the name given to that network. As is common on UNIX, data for pathalias may be read from the standard input or from a list of input files. Typically, each input file represents the data for a given machine or site; file boundaries have semantic implications in the treatment of ‘‘private’’ names and in resolving duplicate connection data. The definition of a satisfactory cost metric proves troublesome. Possible metrics include the actual tele- phone cost of a connection, the nominal frequency of contact (many academic sites are passive — rather than calling out, they wait for other sites to call them), or the transmission speed of a connection. We adopt pragmatic approach: given a choice of paths, we attempt to choose the one that experienced users prefer, as exemplified by existing network traffic. The cost measure is tuned so that pathalias produces routes agreeing with these choices. Using this metric helps balance conflicting concerns. For example, long distance costs often matter less to large corporations than to universities. On the other hand, sites with autodialers have far more freedom to connect with whom they wish. A pragmatic metric also accounts for differences in the reliability of sites; UUCP was not always as reliable as it is today. Actual transmission speed is less important than one might assume; call setup time and the time between calls tend to be the dominant factors, at least for mail mes- sages. With the basis for a metric in hand, symbolic names like HOURLY, DAILY, etc. are assigned numeric values. Early on, these numbers were juggled until, in the estimation of experienced users, the paths pro- duced were reasonable. Dial-up service is specified as DEMAND for a site that is called whenever there is traffic, or its high-grade kin DIRECT, for local phone calls. In the early days, the odds of completing a call were disappointingly low; port contention, line noise, and difficulties with UNIX’s baud-rate switching were common problems. (This judgement may have been unduly colored by local experience. Several important sites had few dial- in lines, or were served by antiquated telephone switching equipment.) DEDICATED connections — two machines hard-wired together — are considered much higher-grade. A complete table is shown below. iiiiiiiiiiiiiiiiiiiiiiiiiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiic Symbolc Value c ciiiiiiiiiiiiiiiiiiiiiiiiiiiic c iiiiiiiiiiiiiiiiiiiiiiiiiiiic LOCAL c 25 c iiiiiiiiiiiiiiiiiiiiiiiiiiiic DEDICATED c 95 c c c c ciiiiiiiiiiiiiiiiiiiiiiiiiiiiDIRECT c 200 c iiiiiiiiiiiiiiiiiiiiiiiiiiiic DEMAND c 300 c c c c ciiiiiiiiiiiiiiiiiiiiiiiiiiiiHOURLY c 500 c iiiiiiiiiiiiiiiiiiiiiiiiiiiic EVENING c 1800 c iiiiiiiiiiiiiiiiiiiiiiiiiiiic POLLED c 5000 c c c c ciiiiiiiiiiiiiiiiiiiiiiiiiiiiDAILY c 5000 c iiiiiiiiiiiiiiiiiiiiiiiiiiiic WEEKLY c 30000 c Costs can be expressed as arbitrary arithmetic expressions, mixing numbers and symbolic values. For example, HOURLY*3 describes a connection that is completed once every three hours. In theory, factors that influence cost are additive; in practice, experience shows that the per-hop overhead in time and reliability is so high that it is important to keep paths short. Thus, for example, DAILY is 10 times greater than HOURLY, instead of 24. OUTPUT Although it would be convenient to compute the path to a destination as needed, the cost of the calculation is prohibitively expensive. Consequently, pathalias precomputes paths to all destinations listed in the input data; these paths are retrieved as needed. The string %s is included in output paths as a marker to indicate where the user name should be inserted. Use of such a marker enables the generated path to be used -3- -- -- directly as a format string for printf. Consider the following input data (a simplified portion of the map from 1981): unc duke(HOURLY), phs(HOURLY*4) duke unc(DEMAND), research(DAILY/2), phs(DEMAND) phs unc(HOURLY*4), duke(HOURLY) research duke(DEMAND), ucbvax(DEMAND) ucbvax research(DAILY) ARPA = @{mit-ai, ucbvax, stanford}(DEDICATED) If run from unc, the following output is produced by pathalias: 0 unc %s 500 duke duke!%s 800 phs duke!phs!%s 3000 research duke!research!%s 3300 ucbvax duke!research!ucbvax!%s 3395 mit-ai duke!research!ucbvax!%s@mit-ai 3395 stanford duke!research!ucbvax!%s@stanford There are several points worth noting about the output. First, all generated paths route mail through duke, despite the presence of a direct connection to phs from unc. The reason for this is obvious, given the cost difference on the links to duke and phs. Second, mail to ARPANET sites employs mixed-syntax addressing; the path to ucbvax uses UUCP con- ventions (i.e., the host name on the left, delimited by an ‘!’), while the ARPANET portion has the host name on the right, delimited by an ‘@’. Finally, output from pathalias is a simple linear file, in the UNIX tradition. If desired, a separate program may be used to convert this file into a format appropriate for rapid database retrieval. DATA STRUCTURES Pathalias runs in three phases: parse the input, build a shortest path tree, and print the routes. Each phase manipulates an in-memory representation of a directed graph.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages15 Page
-
File Size-