Quick viewing(Text Mode)

Internetnews:Ijsenet Transportfor Internetsites Richsalz - Opensoftware Foundation

Internetnews:Ijsenet Transportfor Internetsites Richsalz - Opensoftware Foundation

InterNetNews:IJsenet transportfor Internetsites RichSalz - OpenSoftware Foundation

ABSTRACT

NNTP, the Network News Transfer Protocol, has been labelled the most widely implementedelective protocol in the . The growth of the Internethas meant more sites-exchangingNNTP data. While the explosivegrowth in Usenettraffic placesdemands on all sites,the goal of fast networkaccess puts particulardemands on NNTP hosts. InterNetNewsis an implementationof the Usenettransport layer designedto address this situation. -It replacesthe standardLJNIx a¡chitecturewith a single long-running serverthat handlesall incomingconnections. It has provento be quite succéssful,þroviding quick and efficientnews transfer.

Introduction The Path header is used to prevent message Ioops. is a distributedbulletin board system, For example,an articlewritten at A could get built as a logical network on top of other networks sent to B, D, , and then back to A. Before pro- pagating and connections. By design, messagesresemble an article, a site prependsits own name to standardInternet electronic mail messagesas defined the Path header. Before propagatingan article to a site, the receivinghost checksto in RFC822 [CrockerS2]. The Usenetmessage for- make sure that the mat is described in RFC1036 This site that would receivethe article doesnot appearin [Adams87]. the Path definessome additional headers. It also limits the line. For example,when the article arrived at site C, values of some of the standardheaders as well as the Path would containA!B!D, so site C giving someof them specialsemantics. would know not to sendthe article to.A. Sites Newsgroupsare the classificationsystem of alsokeep a recordof the Message-ID'sof all Usenet. The required Newsgroupsheader specifies articles they currently have. If D receives an article where a message,or article, should be frled upon from B, it will reject the article if C offers it reception. Sites are free to cafiy whatevergroups later. For self-protection,most sites keep a record of they want. Most sites carry the core set of so-callèd recent articles that they no longer have. This is very useful "mainstream" groups. There a¡e currently about when anothersite dumpsa (usuallyquite large) 730 of thesegroups, and one or two new-onesis batchof old newsback out to Usenet. createdevery week. For the pastfew years,the amountof data gen- eratedby Messagesgenerated at a site are sent to the Usenetsites has beendoubling every year. site's"neighbors" who processthem and relay them A site that receives all the mainstreamgroups is to their neighbors,and so on. Sites can be intercon- receivingover 77 megabytesa day spreadout over 11,000 nected- indeed,on the Internet,this is quite com- a¡ticles [Adams92]. About 20Voof the data mon. SeeFigure 1. is article headers,and while all of them must be scannedonly half of it is must be processedby the Usenetsoftware.l The numberof sitesparticipating in Usenethas been growing almost as quickly. Basedon articles his site receivesand survey data sent in by partic! pating sites, Brian Reid estimatesthat there are 36,000sites with 1.4 million participants[Reid91]. A "sendsys" messageto the "inet" distributionin June of 1989 receivedabout 200 replies in the first twenty-fourhours. A year later, nearly 700 replies were received. (Sendsysis a specialarticle that asks all receiving sites to send back an message, usually without human intervention;by convention, inet is primarily the set of siteson the Internet.) Figure 1: Small Usenet topology (all links æe IYes, two-way). this means that, as far as the software is concerned,Usenet is over90Vo noise.

'92 Summer USENIX - June 8-JuneL2,lgg2 - SanAntonio, TX 93 InterNetNews... Salz

The NNTP protocol is definedin InternetRFC appropriaternews or relaynewsprogram processed it. 977 [Kantor86] publishedin February,L986. This In order to avoid processingan article the system was accompaniedby the generalpublic releaseof a already has, it first does a lookup on the history referenceimplementation, also called "nntp." This databaseto seeif the articleexists. It soonbecame hasbeen the only NNTP implementationthat is gen- apparentthat invoking relaynewsfor every article erallyavailable to uNIx sites. lost all of C News'sspeed gain, so the NNTP pack- age \¡/as changedto write ã set of articles into a UsenetSoftware batch,and offer the batchto relaynews, In addition to InterNetNews,there are two When articles arrive faster then relaynewscan major Usenetpackages available for UND(sites. All processthem, they must be spooled. If two sites (B threeshare several common implementation details. and C in the previousexamples) both offer a third A newsgroupname such as comp.foois mappedto a site (D) the samearticle at the "same time" then an directory comp/foo within a global spool directory. extra copy will be spooled,only to be rejectedwhen An article posted to a group is assigneda unique it is processed,wasting disk space;this problem increasingnumber basedon a file called the active multiplies- as the number of incoming sites file. If an article is postedto multiple groups,links increases./To alleviatethis problem,most sitesrun are usedso that only one copy of the data is kept. Paul Vixie's msgidd, a daemon that keeps a A sys file containspatterns describing what news- memory-residentlist of article Message-ID'soffered groupsthe site wishes to receive,and how articles within the last 24 hours. The NNTP server is shouldbe propagated.In most cases,this meansthat modifledso that it tells this daemonall of the arti- a recordof the article is written to a "batchfile" that cles that it is handingto Usenetand queriesthe dae- is processedoffJine to do the actualsending. mon before telling the remotesite that it needsthe The first Usenetpackage is called B News,also article. This is not a perfect solution - if the first, knownas BZ.tt, The B newsmodel is verv simole: spooled,copy of the article is lost or corrupted,the the programrnews is run to processeach lnconiing site will likely never be offered the article after the article. Locking is usedto make sure that only one msgidd cache entry has expired. Going further, rnews processtries to updatethe active file and his- msgiddis work-aroundfor a probleminherent in the tory database.At one site that receivedover 15,000 currentsoftware architecture. articlesper day, the locking would often fail so that Otherproblems, while not as severe,lead to the 10 to 100 duplicateswere not uncommon.Because conclusionthat a new implementationof Usenetis eacharticle is handledby a separateprocess, it is neededfor Internetsites. For examnle: impossibleto pre-calcuateor cacheany usefuldata. ¡ Since all articles are spooled,reløynews can- More importantly,file VO had becomea major not tell the NNTP serverthe ultimate disoosi- bottleneck.A site that feeds10 othersites does over tion of the article, and the servet cannol þ[ 150,000 open/append/closeoperations on its its peer at the other end of the wire. This batchfiles. It is generallyagreed that B newscannot hides transmissionproblems. For example,a keepup with cunent Usenetvolume; it is no longer site tracingthe communicationhas no way of being maintained,and its authorhas said more then finding out an article was rejectedbecause the oncethat the softwareshould be considered"dead." remotesite doesnot receivethat narticularset of newsgroups. C News gets much better performancethen B . news by processingarticles in batches The NNTP referenceimplementation is show- [CollyerST]. ing signs of Tlre relaynewsprogram is run severaltimes a day to age. Maintaining the server is becoming a maintenancenightmare; processall the articlesthat have beenreceived since over one-tenthof its 6,800lines the last run. Since only one relaynewsprogram is are#ifdef-related. o All articles are written running,it is not necessaryto do fine-grainlocking to disk at extra time. Disks are gettingbigger, of any of the supportingdata files. More impor- but not faster,while CPU's,memory, tantly, it can keep the entire active and sys file in andnetworks are. memory. It can also use buffered VO on its InterNetNewsarchitecture batchfrles,reducing the amount of system calls by one or two ordersof magnitude. There are four key programsin the lnterNet- Newspackage (see Figure 2): An alpha version of C News was releasedin Innd is the principal for incoming October, 1,987. Within four years it surpassed B newsfeeds; news in popularity, and there are now more sites runningC News then ever ran B news. - zThis is quite common for Internet sites, where From the beginning, the NNTP reference redundantfast newsfeedsare commonand wheremany implementationwas layered on top of the existing Usenetadministrato$ seem to be avid playersof the Usenetsoftware: an article receivedfrom a remote "exchangenews with as manyother people as possible" NNTP peer was written to a temporaryfile and the game.

94 Summer'92 USEND(- June8-June 12, lgg2 - SanAntonio, TX Salz InterNetNews ...

in¡pcmítreads a file identifying articles and offers Until recently, ínramit used writev to send its them to anothersite; datato the remotehost. At start-upit filled a three- ctlìnnd sendscontrol commandsto inndi element struct iovec array with the following ele- nnrpd is an NNT? serveroriented for newsreaders. ments: Of theseprograms, the most importantis innd. t0] { ".", 1 }; We frrst mentionenough of its architectureto give a tll I placeholder!¡ contextfor the other programs,and then discussits t2l { "\r\n" } designand featuresin more detail at the end of this section. To write a line, the placeholderwas filled in with a pointerto the buffer, and its length,and a single wrÈ fey was done, starting from either elementzero or one. While this implementationwas clever, and l-l simpler then what was done elsewhere,it was not very fast. Innxmit nortr uses a 16k buffer and only RemoteFeeds l-1 doesa write when the next line would not fit. This is also consistentwith ideasused throughout the rest l-] of INN: use the read and write svstem calls. referencingthe dataout of large bufferswhile avoid- Local newsreaders ing the copying commonlydone by the standardI/O library. The ctlinnd program is used to tell the innd serverto perform specialtasks. It doesthis by com- Figure 2: Innd architecture municating over a ut¡x-domain datagram socket. The socket is behind a firewall directorv that is Innd is a single daemon that receives all mode 770, so that only members of íhe news incomingarticles, files them,and queuesthem up for admínistrationgroup can sendmessages to it. It is a propagation.It waits for connectionson the NNTP very small programthat parsesthe frrst parameterin port. \ilhen a connectionis received,a getpeer- its commandline and convertsit to an internalcom- name(2)is done. If the host is not in an accessfile, mand identifrer. This identifrer and the remaining then an nnrpd processis spawnedwith the connec- parametersare sent to innd which processesthe tion tied to its standardinput and output.sIt is worth command,and sendsback a formatted reply. For fioting that nnrpd is only about 3,500lines of code, example,the following commandsstops the server and 20Voof them are for the "POST" command, from acceptingany new connections,adds a news- usedto verify the headersin a user's article. Nnrpd group,and then tells it to recomputethe list of hosts providesall NNTP commandsexcept for "IHAVE" that are authorizednewsfeeds: and an incompleteversion of "NEïVNEWS". On ctlinnd pause "Clearing out log files" the other hand, it does provide extensionsfor ctlinnd nevrgroupcomp.Eourcee.unix m pattern-matching \ on an article header and listing vixie€pa. dec.com exactlywhat articlesare in a group. The NNTP pro- ctlinnd reload newsfeedE"Added osF feed. tocol seemsto be a good exampleof the uNx pliilo- ctlinnd go " " sophy: it is small, general,and powerful and can be The text argumentsare implementedin a very small program. sent to sys/og(8) for audit purposes. Articles are usually forwardedby havng innd The most commonly-usedctlinnd record the article in a "batchfile" which is pro- commandis "flush." This directs the server to cessedby anotherprogram. For Internet sites, the close the batchfilethat is openfor a site, and is ínnxnit programis used to offer articlesto the host typically used as follows: specifiedon its commandline.l The input to innxmít is a set of lines containinga pathnameto the article nv batchfile batchfile.work and its Message-ID.Since the Message-IDis in the ctlinnd flush sitename batchfile, innxmit does not have to open the article innxmit sitename batchfite.work and scan it before offering the article to the remote The site. This can give significantsavings if the remote flush commandpoints out anotherdiffer- ence site alreadyhas a percentageof the articles. betweenINN and other Usenetsoftware. The B News ìnewsprogram needed no extemallocking - files were openedand closed for a very short wln- dow, the time neededto processone article. The C @ons, no singleINN program News relaynel"scould be runningfor a longerperiod im¡lementsthe entire NNTP protocol. rOther of time. The only way to get accessto a batchfrleis programs,hke nntplink,are supportedbut not to either lock the entire news system,which is over- partof theINN dist¡ibution. kill for the desiredtask. or to renamethe file and

'92 Summer USENIX- June8-June 12, Lggz- SanAntonio, TX 95 InterNetNews... Salz wait until the original name shows up again. The ng->SÍtesIi]->Sendit = TRUEi INN approachis more effrcient and conceptually ) cleaner. ) for (sp = Sitesi sp < Lastsitet sp++) { Innd Structure if ( lep->Sendit) When innd startsup it readsthe active file into continue; - *pi memory. An anay of MWSGROUP structuresis for (p sp->FileFlagsi p++) Ewitch (*p) { created,one for each newsgroup,that containsthe 'm'! following elements: cage /* I{rite Message-ID*/ char *Namei /* "comp.eourceg.unix */ cage 'n': char *Diri /* "co¡np/sourceg/unix/" */ /* vtrite filename */ long LaEti lt 02LL t / int LâEtlvidtht /* 5 */ char *LastString; /* "002LI..." ,/ ) char *Resti /* "nì\n..." t/ ) int SitecounÈi /* ! */ The ARTDATA structurecontains information about SITE **Siteai defined */ /* below the currentarticle such as its size, the host that sent The C commentsabove show the data that would it, and so on. The MeetsSiteCriterinfunction is an generatedfor the following line in the activefile: abstractionfor the in.line teststhat are doneto seeif comp.sources.unix 0021.100202 m an articlereally shouldbe propagatedto a site (e.g., checking the Path header as described above). The Last field specifresthe nameto be given to the AssignNameis describedbelow. next article in the group. The LastString element At its core, innd is an IiO scheduler points into the in-memory copy of the frle. This that makes callbackswhen select(2) numberis carefully formattedso that the file can be hasdetermined that thereis activity memory-mapped,or updatedwith a singlewrite. on a descriptor. This is encapsulatedin the CHANNEL structure,which has the following ele- A hash table into the structurearay is built, ments: using a functionprovided by Chris Torek [Torek9l]. The hashcalculation is very simple, yet empirically enuln TYPE Type; it gives near-uniformdistribution. The secondary enum STATE Statei key is the highestarticle number,so groupswith the int fd; most traffic tend to be at the top of the bucket. FUNCPTR Reader; FUNCPTR WTitEDONC; The INN equivalentof the sys file read next. BUFFER Ini An anay of SITE structuresis created,one for each BUFFER Outi site, that containsthe following elements: BOOL Sendit; The Type field is used for internal consistency checks. There four different - char FileFlags t 10I ; types of channels local-accept,remote-¿cc€pt, local-control (used by The FileFlags anay specifies what information ctlinnQ and NNTP connection. Each type is imple- shouldbe written to the site's data streamwhen it mentedin anywherefrom 100to 1200lines of code. receivesan article. The subscriptionlist for the site The Reader and WriteDone function pointers, and is then parsed,and for all the newsgroupthat it the State enumerationare used for protocol-specific recives, the matchingNEWSGROUP structure will data. For example,State freld is usedby the NNTP containa pointerto the SITE structure. channelcode to determinewhether the site is send- Using these two structuresit is easy to step ing an NNTP commandor an a¡ticle. The BUFFER throughhow an article is propagated: datatypecontains sized reusableI/O buffers that extern ARTDATA*arti grow asneeded. extern SITE *Sites, *Lastsj,tei At start time innd calls getdtablesize(2)to extern int nsitesi create an array of channelsthat can be directly char **ppi indexedby descriptor. SITE *spi NEVISGROUP*Ng' The codeto listenon the NNTP port is showin int i; Figure 3. When a host connectsto the NNTP port, select(2) will report activity on the descriptorand while ("pp) { call RemoteReaderwhich will acceptthe connection ng = llashNewsgroup(*pp++)i and possibly if (ng == NULL) createûll in a new CHANNEL out of continue; the resultantdescriptor. ÀssignNane(ng); It took a bit of effort to write the callbackloop for (i = 0i i < ng->nsites¡ i++¡ 1 so that it was fair - i.e., so that the lowestdescrip- Íf (MeetsSiteCrÍtera(ng->Sites Ii], art) ) tors did not get priority treatment.The problemwas

96 Summer '92 USENIX - June 8.June 12, t992 - San Antonio, TX Salz InterNetNews ...

complicatedbecause other parts of the servercan This is a very fast way of writing the article to disk; add and removethemselves from the select(2) read it avoidsextra memory copies,and is only possible andwrite mask,.asneeded. becausethe entire article is kept in memoryuntil the Oncethe NNTP channelhas beencreated for a lastmoment. site, the serveris ready to acceptarticles from that site. The readerfunction for NNTP channelsreads Future work as much data as is availablefrom the descriptor. If RFC 977 follows the SMTP protocol for send- it is in "get a command"state, it looksfor a simple ing text: line are terminatedwith \r\n, a period is \rh terminator;if it is in "readingan article" state, placed before all lines starting with a period, and it looks for a "\r\n.\r\n" terminator. If not found, it datais terminatedwith a line consistingof a single just returns;the datawill becomeavailable at some period [Postel82]. Innd mtst scan the text of ã[ point. If the terminator is found, it processesthe aficles it receives and convert them to standard data in the buffer. For ûling an article, this means UNIXformat. On the transmissionside, dnnrzif must cleaningup the NNTP text escapes,and calling the read the articlesa line at a time in order to add the article abstractionto processthe article. extra data. If all newsreadingis done via NNTp; Processingthe article is the largestsingle rou- then a¡ticles could be storedãirectly in NNTp for- tine in the server. TllreAssignNarae shown above mat, and inrumit could read and write the a¡ticle in incrementsthe high-watermark for the newsgroup. two systemcalls. The innd gains would not be as If the articlehas already been written to disk,ã lint dramatic,but tests show it would still be somewhat is made underthe new name. (Symboliclinks, if measurable. available,can be usedif the spoolarea spans parti- Thereis no NNTP "TURN" command,so that tions.) If the article has not been written, a a single connectioncannot be used for bidirectional struct iovec anay is set up as shown below, where article transfer. Turn-aroundis verv successfulon verticalbars separate each iovec element: UUCP over conventionalphone linei, but seemsof First limited use on higher-performancenetwork links. headers. . . ,,TURN" Path: n¡ih The SMTP protocolhas had a command lr,ocar,_eaTH_pREFlxI rest of vsv..... - - . Secondheaders... since its inception,but it has receivedno practical use. Severalpeople I XREF_HEADERI find the idea of addingoutgoing transferto innd attractive,since it is alreadystruc. larticle body. tured for multi-host I/O and the idea of caching int RemoteReader( cp ) CHAIiINEL*CP; { int newfd; struct sockadr_in remotei ínt remsize; = newfd accept(cp->fd, &remote, &remsize) ; if ( InAccessFile(remsocket) ) CHANcreate(newfd, TypEnntp, STATEgetcmd,NCreader, NCwrítedone) ; else { ForkAndExec("nnrpd", newfd) ; close(newfd); ) ) int Remotesetup ( ) { int fd;

= fd GetSocketBoundToNNTpport(); cHAli¡create(fd, TypEremote, srATEristen, RemoteReader, NULL);

Figure 3: Listeningon an NNTP port

'92 - Summer USENIX June E:JuneL2,lgg2 - SanAntonio, TX 97 InterNetNews... Salz recent articles in memory has its appeal. Adding [Adams92]Rick Adams, Total traffic through uuner outgoing transfer to innd would take a moderate for the last 2 weelcs, Usenet message effort. <<[email protected]Þin news, listS, April, t992. Conclusionsand Comparisons [CollyerST]Geoff Collyer and Henry Spencer,lVews The InterNetNewsarchitecture works. Profiling Need not be SIow, Usenix Winter Conference, a productioninstallation for 24 hours showedthat t987. open(Z) accountedfor LIVI of the run time. Since [CrockerS2]David H. Crocker,Standard for the For- the serveronly does oneopen(2) per article,it is not mat of ARPA Internet Text Messages,Request clear if any other performancetuning is needed. For Comments 822, Marina del Rey, CA: The profiling overheadaccounted for 5Voof the run- InformationSciences Institute, 1982. time. [Kantor86]Brian Kantor, Phil Lapsley, Network Several optimizations are available because NewsTransfer Protocol: A ProposedStarúard there is only one process,and becauseit is always for the Stream-BasedTransmission of News, running. For example, avoiding duplicates is an Requestfor Comments977, Marina del Rey, integralpart of the server. If a secondsite offers an CA: InformationSciences Institute, 1.986. articlewhile a first site is sendingit, the NNTP code [Postel82]Jonathan B. Postel,Simple Mail Transfer will put the channelto "sleep" for a short while Protocol, RequestFor Comments821, Marina beforereplying to the secondoffer. This is usually del Rey, CA: Information SciencesInstitute, enoughtime to have the first site finish sendingthe 1982. article, reducingthe numberof duplicatesfrom hun- [Reid91]Brian Reid, Usenct ReadershipSwntnary dredsto nearlynone, with no externalprograms. Report fo, May 91, Usenet message Since the serveris alwaysrunning, the system <[email protected]> in has a much smoother performancecurve. As a news,lßts,June, 1991. result,it "feels" much fasterto users. [Torek91]C]nis Torek, Hash function for text in C, Usenetmessage <[email protected]> Another unexpectedbenefit is that æticles æe in comp.Iang,c, October, acceptedor rejectedsynchronously. A usercan post 1990. an article, and by the time their posting agent has Author Information returned,it has been w¡itten to the spool directory and queuedfor remote transfer. If there is a prob- Rich Salz is a Senior SoftwareEngineer at the lem such as having an illegal newsgroupspecified, Open Software Foundation,where is a memberof the userfounds out immediately. the DCE group. His current areasof concentration are RPC and the distributedtime service. joined The design of the server seems to be very He OSF after working at BBN for nearly five years, good, split into abstractionsthat are very indepen- working on the Cronus Distributed Programmíng dant. For example, sites have no knowledge of Environment. Rich attended MIT. He can be incommingNNTP connections.Using callbackslets reachedvia U.S. Mail at OpenSofnvare eachportion of the serversafely do I/O without wor- Foundation; 11 Cambridge Center; Cambridge, MA 02L42. rying that it might affect other parts. Much of the Reachhim electronicallyat [email protected]. Usenetprocessing becomes trivial when serialized, suchas accessto the history file. The designhas also led to a fairly small pro- gram: it is under 13,000 lines, and about 20/o of. them a¡e comments. This comparesfavorable to the 7,400 lines in the equivalentC News programand the 7,600 lines in the NNTP referenceimplementa- tion. Availability The INN packageis freely redistributable,and is available for anonymousFTP from ftp.uu.netas -ftp/news/inn.tar.Z. It is discussedin the Usenet newsgroupsnews. software.nntp and news.softwa¡e.b. References [Adams87]Rick Adams, Mark Horton, Standardfor Interchangeof USENETMessages, Request For Comments1036, Marina del Rey, CA: Infor- mationSciences Institute. 1987.

9t Summer'92 USEND(- Junet-June 12,L992 - SanAntonio, TX