Internetnews:Ijsenet Transportfor Internetsites Richsalz - Opensoftware Foundation
Total Page:16
File Type:pdf, Size:1020Kb
InterNetNews:IJsenet transportfor Internetsites RichSalz - OpenSoftware Foundation ABSTRACT NNTP, the Network News Transfer Protocol, has been labelled the most widely implementedelective protocol in the Internet. The growth of the Internethas meant more sites-exchangingNNTP data. While the explosivegrowth in Usenettraffic placesdemands on all sites,the goal of fast networkaccess puts particulardemands on NNTP hosts. InterNetNewsis an implementationof the Usenettransport layer designedto address this situation. -It replacesthe standardLJNIx server a¡chitecturewith a single long-running serverthat handlesall incomingconnections. It has provento be quite succéssful,þroviding quick and efficientnews transfer. Introduction The Path header is used to prevent message Ioops. Usenetis a distributedbulletin board system, For example,an articlewritten at A could get built as a logical network on top of other networks sent to B, D, C, and then back to A. Before pro- pagating and connections. By design, messagesresemble an article, a site prependsits own name to standardInternet electronic mail messagesas defined the Path header. Before propagatingan article to a site, the receivinghost checksto in RFC822 [CrockerS2]. The Usenetmessage for- make sure that the mat is described in RFC1036 This site that would receivethe article doesnot appearin [Adams87]. the Path definessome additional headers. It also limits the line. For example,when the article arrived at site C, values of some of the standardheaders as well as the Path would containA!B!D, so site C giving someof them specialsemantics. would know not to sendthe article to.A. Sites Newsgroupsare the classificationsystem of alsokeep a recordof the Message-ID'sof all Usenet. The required Newsgroupsheader specifies articles they currently have. If D receives an article where a message,or article, should be frled upon from B, it will reject the article if C offers it reception. Sites are free to cafiy whatevergroups later. For self-protection,most sites keep a record of they want. Most sites carry the core set of so-callèd recent articles that they no longer have. This is very useful "mainstream" groups. There a¡e currently about when anothersite dumpsa (usuallyquite large) 730 of thesegroups, and one or two new-onesis batchof old newsback out to Usenet. createdevery week. For the pastfew years,the amountof data gen- eratedby Messagesgenerated at a site are sent to the Usenetsites has beendoubling every year. site's"neighbors" who processthem and relay them A site that receives all the mainstreamgroups is to their neighbors,and so on. Sites can be intercon- receivingover 77 megabytesa day spreadout over 11,000 nected- indeed,on the Internet,this is quite com- a¡ticles [Adams92]. About 20Voof the data mon. SeeFigure 1. is article headers,and while all of them must be scannedonly half of it is must be processedby the Usenetsoftware.l The numberof sitesparticipating in Usenethas been growing almost as quickly. Basedon articles his site receivesand survey data sent in by partic! pating sites, Brian Reid estimatesthat there are 36,000sites with 1.4 million participants[Reid91]. A "sendsys" messageto the "inet" distributionin June of 1989 receivedabout 200 replies in the first twenty-fourhours. A year later, nearly 700 replies were received. (Sendsysis a specialarticle that asks all receiving sites to send back an email message, usually without human intervention;by convention, inet is primarily the set of siteson the Internet.) Figure 1: Small Usenet topology (all links æe IYes, two-way). this means that, as far as the software is concerned,Usenet is over90Vo noise. '92 Summer USENIX - June 8-JuneL2,lgg2 - SanAntonio, TX 93 InterNetNews... Salz The NNTP protocol is definedin InternetRFC appropriaternews or relaynewsprogram processed it. 977 [Kantor86] publishedin February,L986. This In order to avoid processingan article the system was accompaniedby the generalpublic releaseof a already has, it first does a lookup on the history referenceimplementation, also called "nntp." This databaseto seeif the articleexists. It soonbecame hasbeen the only NNTP implementationthat is gen- apparentthat invoking relaynewsfor every article erallyavailable to uNIx sites. lost all of C News'sspeed gain, so the NNTP pack- age \¡/as changedto write ã set of articles into a UsenetSoftware batch,and offer the batchto relaynews, In addition to InterNetNews,there are two When articles arrive faster then relaynewscan major Usenetpackages available for UND(sites. All processthem, they must be spooled. If two sites (B threeshare several common implementation details. and C in the previousexamples) both offer a third A newsgroupname such as comp.foois mappedto a site (D) the samearticle at the "same time" then an directory comp/foo within a global spool directory. extra copy will be spooled,only to be rejectedwhen An article posted to a group is assigneda unique it is processed,wasting disk space;this problem increasingnumber basedon a file called the active multiplies- as the number of incoming sites file. If an article is postedto multiple groups,links increases./To alleviatethis problem,most sitesrun are usedso that only one copy of the data is kept. Paul Vixie's msgidd, a daemon that keeps a A sys file containspatterns describing what news- memory-residentlist of article Message-ID'soffered groupsthe site wishes to receive,and how articles within the last 24 hours. The NNTP server is shouldbe propagated.In most cases,this meansthat modifledso that it tells this daemonall of the arti- a recordof the article is written to a "batchfile" that cles that it is handingto Usenetand queriesthe dae- is processedoffJine to do the actualsending. mon before telling the remotesite that it needsthe The first Usenetpackage is called B News,also article. This is not a perfect solution - if the first, knownas BZ.tt, The B newsmodel is verv simole: spooled,copy of the article is lost or corrupted,the the programrnews is run to processeach lnconiing site will likely never be offered the article after the article. Locking is usedto make sure that only one msgidd cache entry has expired. Going further, rnews processtries to updatethe active file and his- msgiddis work-aroundfor a probleminherent in the tory database.At one site that receivedover 15,000 currentsoftware architecture. articlesper day, the locking would often fail so that Otherproblems, while not as severe,lead to the 10 to 100 duplicateswere not uncommon.Because conclusionthat a new implementationof Usenetis eacharticle is handledby a separateprocess, it is neededfor Internetsites. For examnle: impossibleto pre-calcuateor cacheany usefuldata. ¡ Since all articles are spooled,reløynews can- More importantly,file VO had becomea major not tell the NNTP serverthe ultimate disoosi- bottleneck.A site that feeds10 othersites does over tion of the article, and the servet cannol þ[ 150,000 open/append/closeoperations on its its peer at the other end of the wire. This batchfiles. It is generallyagreed that B newscannot hides transmissionproblems. For example,a keepup with cunent Usenetvolume; it is no longer site tracingthe communicationhas no way of being maintained,and its authorhas said more then finding out an article was rejectedbecause the oncethat the softwareshould be considered"dead." remotesite doesnot receivethat narticularset of newsgroups. C News gets much better performancethen B . news by processingarticles in batches The NNTP referenceimplementation is show- [CollyerST]. ing signs of Tlre relaynewsprogram is run severaltimes a day to age. Maintaining the server is becoming a maintenancenightmare; processall the articlesthat have beenreceived since over one-tenthof its 6,800lines the last run. Since only one relaynewsprogram is are#ifdef-related. o All articles are written running,it is not necessaryto do fine-grainlocking to disk at extra time. Disks are gettingbigger, of any of the supportingdata files. More impor- but not faster,while CPU's,memory, tantly, it can keep the entire active and sys file in andnetworks are. memory. It can also use buffered VO on its InterNetNewsarchitecture batchfrles,reducing the amount of system calls by one or two ordersof magnitude. There are four key programsin the lnterNet- Newspackage (see Figure 2): An alpha version of C News was releasedin Innd is the principal news server for incoming October, 1,987. Within four years it surpassed B newsfeeds; news in popularity, and there are now more sites runningC News then ever ran B news. - zThis is quite common for Internet sites, where From the beginning, the NNTP reference redundantfast newsfeedsare commonand wheremany implementationwas layered on top of the existing Usenetadministrato$ seem to be avid playersof the Usenetsoftware: an article receivedfrom a remote "exchangenews with asmany other people as possible" NNTP peer was written to a temporaryfile and the game. 94 Summer'92 USEND(- June8-June 12, lgg2 - SanAntonio, TX Salz InterNetNews ... in¡pcmítreads a file identifying articles and offers Until recently, ínramit used writev to send its them to anothersite; datato the remotehost. At start-upit filled a three- ctlìnnd sendscontrol commandsto inndi element struct iovec array with the following ele- nnrpd is an NNT? serveroriented for newsreaders. ments: Of theseprograms, the most importantis innd. t0] { ".", 1 }; We frrst mentionenough of its architectureto give a tll I placeholder!¡ contextfor the other programs,and then discussits t2l { "\r\n" } designand featuresin more detail