A Proposed Technique for Tracing Origin of Spam on the Usenet
Total Page:16
File Type:pdf, Size:1020Kb
A proposed technique for tracing origin of spam on the Usenet by Dirk Bertels, BComp A dissertation submitted to the School of Computing in partial fulfillment of the requirements for the degree of Bachelor of Computing with Honours University of Tasmania June 2006 This thesis contains no material which has been accepted for the award of any other degree or diploma in any tertiary institution. To the candidate’s knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made in the text of the thesis. Signed Dirk Bertels Hobart, June 2006 Abstract The Usenet, a worldwide distributed decentralized conferencing system, is widely targeted by spammers who use a variety of techniques in order to obscure their identity. One of these techniques is called path preload, in which the path header is spoofed by means of attaching a false section at the beginning of this path. The process of detecting and confirming path preload is laborious and requires a thorough understanding of the Usenet. A technique which downloads a particular article from several servers, and compares their path headers is explored as to its usefulness regarding path preload detection. This document begins with a general background on the Usenet, highlighting those aspects that are relevant to the research, especially the topics of Usenet headers and spam. This leads to a description of the proposed technique and the development of a tool capable of implementing this technique. The tool essentially downloads a spam article from different servers, and analyses their headers. A map is constructed from the data gathered showing the servers that the articles traverse in order to reach their destiny. Since each article is downloaded from more than one location, some commonality may be found in these trajectories. If this commonality occurs at the beginning of the trajectory, the article is said to have a common path. Once a common path is established, a more heuristic approach is taken in order to establish some preliminary conclusions as to whether this common path is likely to be path preload or not. This approach requires some knowledge about various aspects of the Usenet and involves additional anti-spam techniques alongside the common path technique. Several sessions have been conducted and the results outlined, analysed, and discussed at the end of this document, followed by some thoughts on possible future advances and further improvements. Acknowledgements The author likes to thank the following people for their contribution: Prof. Christopher Lueg Supervisor Dr. Vishv Malhotra Co-supervisor Index 1. Introduction ...................................................................................................... 1 2. The Usenet – an overview ................................................................................ 4 2.1. Origin of the Usenet..................................................................................................4 2.2. Current state of the Usenet........................................................................................5 2.3. Usenet Addressing and Newsgroups.........................................................................6 Crossposting..............................................................................................................7 2.4. The Usenet Networking System................................................................................8 The UUCP network...................................................................................................8 NNTP - The News Network Transport Protocol .......................................................9 2.5. Request For Comments (RFC) standards..................................................................9 RFC2822: Standard for Interchange of Usenet messages. .......................................9 RFC 977: Network News Transfer Protocol...........................................................10 2.6. Usenet Structure and Propagation...........................................................................11 2.7. Current handling of Usenet traffic ..........................................................................12 2.8. Usenet Servers and Clients......................................................................................13 The INN Server software.........................................................................................13 NNTP clients - Usenet readers................................................................................14 3. Visualisation - Mapping various aspects of the Usenet............................... 15 Tree maps................................................................................................................16 Traffic flow maps ....................................................................................................17 Conversation maps..................................................................................................18 4. Spamming........................................................................................................ 19 4.1. History of spam.......................................................................................................19 4.2. Usenet Spam ...........................................................................................................20 Definition of Usenet spam.......................................................................................20 Crossposting............................................................................................................20 Excessive Crossposting - ECP ................................................................................20 Excessive multiposting (EMP) ................................................................................20 Breidbart Index .......................................................................................................21 4.3. Spam Cancel ...........................................................................................................22 4.4. Spam and anti-spam Techniques.............................................................................23 Spam traps...............................................................................................................24 4.5. Spam news servers statistics from 2006..................................................................24 4.6. Is the Spammer working from a server or a client? ................................................25 4.7. Report spam............................................................................................................ 25 5. Networking and Spamming Tools .................................................................26 5.1. Usenet Control Clients ........................................................................................... 27 5.2. Basic networking commands.................................................................................. 28 dig and nslookup .................................................................................................... 28 whois 29 traceroute ............................................................................................................... 29 5.3. Additional networking tools................................................................................... 30 Geolocation ............................................................................................................ 30 Server networking tools.......................................................................................... 31 6. Headers ............................................................................................................32 6.1. General discussion.................................................................................................. 32 6.2. Description of main headers................................................................................... 34 6.3. The ‘path’ header.................................................................................................... 35 The posted stamp.................................................................................................... 36 The mismatch stamp ............................................................................................... 36 6.4. The ‘nntp-posting-host’ header .............................................................................. 38 6.5. The x-trace header.................................................................................................. 38 6.6. The message-id header ........................................................................................... 39 6.7. Other Headers relevant to the research................................................................... 40 The xref header....................................................................................................... 40 The injection-info header ....................................................................................... 41 6.8. Headers of binary newsgroups ............................................................................... 41 7. Common Path and Path Preload...................................................................43 7.1. Definitions: Path preload, Common Path, Adjacent Server, and Boundary server.43 7.2. Why path preload is used ....................................................................................... 45 7.3. When is a common path ‘path preload’?...............................................................