Studying Online Newsgroups Danyel Fisher and Christopher Lueg
Total Page:16
File Type:pdf, Size:1020Kb
Appendix Studying Online Newsgroups Danyel Fisher and Christopher Lueg This program posts news to thousands of machines throughout the entire civi lized world. Your message will cost the net hundreds ifnot thousands ofdollars to send everywhere. Please be sure you know what you are doing. Are you absolutely sure that you want to do this? [yIn] Warning message displayed by original "nn" newsgroup program before posting A.1 Introduction Several of the chapters in this volume have discussed studying online newsgroups. The raw data they tap into is commonly available; however, it can be quite a chal lenge to mine that data for useful information. This appendix is meant to serve two purposes. First, it is a basic introduction to the day-to-day structure of Usenet. It emphasises the threading structure of messages, and how those threads relate to news groups; it discusses the hierarchy of group names, and how those hierarchies are formed; and it discusses the life and death of newsgroups. It is meant as a general continuation of Pfaffenberger's chapter (this volume); it also draws on some ofthe information presented by Smith (this volume). Second, it is a technical introduction to dealing with Usenet data. For many pro jects, it is easy enough to track some current messages from Google's "Groups" archive, or to count the number of posters at Netscan (Smith, this volume); however, to actually look at the content ofmessages still requires downloading and reading them all. This section briefly discusses some of the tools that can be useful in downloading and reading Usenet groups. It assumes some knowledge of pro gramming - currently, to the best of our knowledge, there are no public domain tools for Usenet analysis - but may be of some use as a list of starting points. Many of these facts may also be applicable to other online newsgroups and dis cussion boards; Usenet, as the granddaddy ofso many ofthem, has been an impor tant social shaper of the norms and understandings of these groups. A.2 Posting to Usenet Usenet is a distributed bulletin board: messages posted to one machine are sent around the world and readable on any other machine. Because ofthe Usenet prop agation mechanism (discussed in section A.3), there is no definite order in which 254 From Usenet to Co Webs messages will arrive; however, almost all messages make it through. There are no central controls on who can post a message; rather, the system is open to all. Instead, site administrators have the ability to interrupt or drop messages. This makes for a somewhat anarchic system; messages flow constantly from all over the world. But there is a way to organize to the messages. Each message is placed within one of more 'newsgroups;' software clients are designed to aid in reading those groups. A user might choose to read only the newest messages in comp.lang.python, for example, or rec.cooking.recipes. The former, by its name, is for discussion of the computer language Python; the latter is for exchange of recipes. A.3 Usenet Propagation Usenet is interesting partially in that it acts in a way fairly unlike other online sources. It is a fully decentralized system; there is no central authority of the Usenet system. Email messages, for example, are routed and then sent directly: my message to you goes through the smallest number of servers it can, Usenet mes sages, in contrast, are propagated indirectly from machine to machine. Using a pro tocol known as NNTP, each sites subscribes to one or more "feeds" from other sites. Each site, therefore, can only receive the messages that are sent on by the sites around them. There is no canonical central core of Usenet; rather, each site has a partial control of what is seen by those sites downstream from it. The impact of this is lessened slightly because each site can subscribe to feeds from several dif ferent sources. One of the implications of this is that it becomes important to know who is upstream of a site: because Usenet propagation is decided on a site-by-site basis, the messages that are visible from one site might be different than those from another; some upstream sites choose not to carry certain types of messages. For example, Figure A.I shows two headers of messages arriving at the University of California at Irvine Paths are read backwards, and are separated by"!" marks. The first message has wound its way from Indiana to the University of Illinois, to Oregon, to Cal State Long Beach. The second was posted at Google.com, has passed through Syracuse university, San Diego State, and Cal State Long Beach before reaching Irvine. In Figure A.2 we see the reverse - one author of this piece sent a message to a newsgroup to be checked by the other. The message originated at UC Irvine; it travelled a rather tortuous path to Switzerland, where it was read. In the meantime, it had passed through San Diego State and Oregon before going to Syracuse. This one spent more time at non-academic sites - the international networks icl.net and mediaways.net carried it - before sending it, at last, to the Swiss university. news.service.uci.edu!csulb.edu!canoe.uoregon.edu !Iogbridge.uoregon.edu!vixen.cso.uiuc.edu!news.indiana.edu!ahabig news.service.uci.edu!csulb.edu!newshub.sdsu.edu !news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!feeder.qis.net !sn-xit-02!supernews.com!postnews1.google.com!not-for-mail Figure A.I Paths of messages to California. Appendix 255 news.ifi.unizh.ch!news-zh.switch.ch!feedme.news.mediaways.net !newsfeed.icl.net!news.maxwell.syr.edu!logbridge.uoregon.edu !ihnp4.ucsd.edu!news.service.uci.edu!not-for-mail Figure A.2 The path of a message from California to Switzerland. Each of those servers had its own policy for which messages to pass on, and how often. Each of them cares about a different selection of newsgroups. With a web this tangled, it becomes clear why different sites can see a different selection. This also means that there is no canonical view of Usenet: the sites that Netscan sees on the Microsoft campus, and the order they arrive, may be quite different from the messages arriving at Google's headquarters in Palo Alto. More confusingly, there are many different types of software meant to help handle administer Usenet traffic. Usenet traffic has always been extensive, and it has grown even more in the last few years. Unfortunately, much of that growth has been in spam - undesirable, off-topic advertisements - so a number of sites have started to carefully constrain which messages they pass on. There is no longer such a thing as an unconstrained Usenet feed of all groups. A.4 The Birth and Death of Newsgroups Pfaffenberger (this volume) gives a good overview of how groups are created. In classic Usenet fashion, there are three or four distinct methods. One of them is voting. To keep the hierarchy ofimportant topics well-organized, there are specific conditions under which a new group can be added to the "big 8" hierarchy. Essentially, a new group requires a period of discussion and comments, followed by a period of online voting. Current members involved in the parent group are expected to vote on whether a new group would serve a distinct population, and whether the additional complexity in the hierarchy would be worth the additional specificity. (For example, a regular country music listener might vote against a new country music group because he wants to minimize the number of groups he checks daily for new messages, or might vote for it in order to ensure he has a com munity of dedicated fans to tap into.) Email votes for and against are counted, and ifover half of the votes come out in favour, then the group is instituted. The voting is done, to some extent, on the honour system: although votes are kept and pub lished by email address, there is no way to ensure that some addresses are not mul tiply maintained. Still, the system often works out to be a fair one, with the exception of some notable abuses. One notable explosion was the notorious rec.music.white-power vote. When members of the newsgroup rec.music became irritated at certain political attitudes in their group, some members started a campaign to split off their own special interest. Unfortunately, word got out the net. Well-meaning net users, wanting to strike a blow against white separatists, sent in thousands of malformed "no" votes. The exhausted moderator eventually declared the election a failure, and rec.music.white-power was not created. 256 From Usenet to Co Webs However, those parties did find other places to create their group. As Pfaffenberger explains, there are other hierarchies, not subject to votes. One, the alt groups, are collectively known as the alternet. Groups there are created informally, based on discussions in the group alt.config. Because anyone can create an alt group fairly easily, the propagation ofthose groups is much less standardized. It is impos sible to keep up with the tens of thousands of groups that are created and destroyed constantly. Within the groups, many alt messages are not distributed very far, are filtered rapidly by destination sites, or are kept in a spool of far fewer messages than messages in the 'big 8: There are many joke groups alt.Swedish.chef.bork.bork.bork is named after a humorous character on a children's television show, and alt.fan.star-trek.sexy.bald.captain is a fan group for Patrick Stewart's role in the 1980s Star Trek: The Next Generation TV show.