The Nuts and Bolts of a Forum Spam Automator
Total Page:16
File Type:pdf, Size:1020Kb
The Nuts and Bolts of a Forum Spam Automator Youngsang Shin, Minaxi Gupta, Steven Myers School of Informatics and Computing Indiana University, Bloomington {shiny,minaxi}@cs.indiana.edu, [email protected] Abstract The effects of forum spamming have been studied be- fore [18, 21]. In contrast, we focus on how spammers Web boards, blogs, wikis, and guestbooks are forums fre- actually post spam links on forums since understand- quented and contributed to by many Web users. Unfor- ing their modus operandi can offer important insights for tunately, the utility of these forums is being diminished mitigating forum spamming. Toward our goal, we survey due to spamming, where miscreants post messages and forum spam automator tools including XRumer, SEnuke, links not intended to contribute to forums, but to adver- ScrapeBox, AutoPligg, and Ultimate WordPress Com- tise their websites. Many such links are malicious. In ment Submitter (UWCS) for their functionality. These this paper we investigate and compare automated tools tools enable forum spammers to automatically post spam used to spam forums. We analyze the functionality of the to a large number of target forums. most popular forum spam automator, XRumer, in details We explore XRumer [7] in detail. It is one of the most and find that it can intelligently get around many prac- popular forum spamming automators on blackhat SEO tices used by forums to distinguish humans from bots, all forums, in fact perhaps the most, and has the most ex- while keeping the spammer hidden. Insights gained from tensive set of features. These include the ability to au- our study suggest specific measures that can be used to tomatically register fake accounts at target forums, build block spamming by this automator. appropriate browsing history for each forum, and post 1 Introduction spam. XRumer is capable of posting at forums built on many software platforms, such as phpBB and vBul- With millions of websites on the Internet, attracting vis- letin. Additionally, it can be made to learn new fo- itors to a given site is non-trivial. Website operators, rum platforms it does not directly support. XRumer particularly of unsavory sites, are always on the look- can also circumvent spam prevention mechanisms com- out for new mechanisms to make their websites visible. monly employed by forum operators (e.g., automatically While link embedded email spam continues to be a popu- solving CAPTCHAs). To keep spammers hidden and to lar technique for driving traffic to such sites, increasingly, avoid blacklisting, XRumer allows the use of anonymiz- search engines are being exploited as well. The latter ac- ing proxies which can hide the IP addresses of spamming tivity is so pervasive it is termed web spamming. Specif- machines. Further, it can adjust spamming patterns along ically, web spamming exploits algorithms used by pop- the dimensions of time and message content to thwart de- ular search engines in order to gain better rankings with tection. The result is a sophisticated tool designed to by- respect to other sites on the Web. While many tricks are pass most prevention techniques, and one that can adapt used to achieve the goal, including building link struc- to changes in forum software platforms. The picture is tures favored by search engines, the focus of this paper not entirely gloomy however, for we find several quirks is forum spamming, where miscreants post links to their in XRumer’s functionality that we believe can be opera- websites on forums frequented by Internet users. A fo- tionalized into tools that can help mitigate forum spam. rum is a website where visitors can contribute content. Examples of forums include web boards, blogs, wikis, 2 Methodology and guestbooks. Forums help miscreants in two ways: They help drive traffic to a site directly, and simulta- XRumer has separate demonstration and production ver- neously increase search-engine rankings for the linked sions. The demo version contains the documentation of websites. Forums are an attractive target for miscreants the full version but has limited functionality. We con- because forums with useful content cannot be blacklisted sidered using the production version for this paper as or taken down. While search engines often have a global Motoyama et al. in [17] did, testing the economics of view of the link structure which permits them to discount CAPTCHA solving. However, we were faced with the some forum spam [23, 14, 11, 9], it does not effectively moral dilemma that in order to test the efficacy of vari- help forum operators keep their forums free of spam. Ul- ous production features of XRumer, we would have cre- timately, it is up to forum administrators to remove spam ate fake accounts and post spam on real-world forums, links or prevent their initial posting! against their terms of use. Short of posting on actual forums, we would simply be posting spam on custom in- Hrefer uses the keywords to send search queries. house forums, which would preclude testing of various To send search queries, XRumer allows a spammer production features in their entirety. This dilemma was to choose between Google Web Search, Google Blog not present in Motoyama’s work. Therefore, we decided Search, MSN, Yahoo, AltaVista, Yandex [8], and board- to work with the demo version of XRumer 5.05. It allows reader [1]. Each of these search engines provides APIs registering an account at a forum and then posting a pre- to automate searches but Hrefer does not use them be- defined message. We could test this on in-house forums cause APIs provide restricted search results and presum- we setup for the purpose, avoiding any dilemmas. able because they can be used to trace a spammer’s ac- We set up our own forum server and observed the tivity. So, Hrefer tries to mimic web browser behavior. It packets XRumer generated while posting spam to it. As parses the returned search results to find target forums. we describe later in Section 5, this exercise offered use- ful insights into defeating XRumer. Additionally, we uti- Composing Spam Message A prudent composition of lized the documentation to gain insights into its other the spam message is as important for spammers’ suc- functionalities. For example, one file defines the rules cess as is finding relevant target forums. Thus, all to solve text-based question-and-answer CAPTCHAs. messages must have an appropriate subject. To help While spammers can add more rules to it, this file per- defeat message-based filtering, XRumer supports vari- mitted us to determine the class of text CAPTCHAs ous macros that help create variants of spam messages XRumer can solve off the shelf. Another file included that are semantically the same but syntactically dif- information defining how XRumer identifies different ferent. For example, with a simple variation macro, forum software. Finally, other files contained various {variant 1|variant 2|...|variant N}, chooses one of User-Agent strings that XRumer can use to imperson- the N possible variants in the spammed message. The ate browsers, fake names, addresses, interests etc. This result is that a spammer can create different greetings information is used to create and register fake user ac- such as {Hi!|Hello!|What’s up!} in the spam message. counts on varying forums. XRumer leaves the task of choosing appropriate syn- The XRumer demo version does not provide for the onyms up to a spam campaign operator. However, the use of an anonymizing proxy that hides the poster’s IP spam automator SEnuke, supports an automated the- address. To overcome this limitation, we wrote our own saurus of synonyms a spammer can choose from. code that sends an HTTP request to our forum server by using a public anonymous proxy. This allowed us to ex- In addition to the simple variation macro, a spam- amine the change in the HTTP request forwarded by the mer may also use a conditional variation macro based proxy. on the forum’s environment. For example, the macro, In order to compare XRumer’s functionality with other {[T LD 1] text for T LD 1|...|[T LD N] text for T LD N} , de- automators, we surveyed the list of functions different cides the output text based on the top level domain (TLD) automators support based on their websites and tutorial of the forum used to post spam. This permits a large videos available online. This included numerous black- amount of contextualization. For example, {#.com hat search engine optimization (SEO) forums. Hi!|#.fr Bonjour!}, results in an English or French salutation depending on the .com or .fr TLD. Table 1 3 Primal Functionalities provides the full list of macros supported by XRumer. We begin by describing XRumer’s basic functionality, Multiple studies, including [15, 22, 19], have identified including how it finds target forums, composes spam, the use of macros in the context of email spam. In partic- and posts it. ular, Pitsillidis et al. in [19] explore how to model variant spam messages generated from the same spam botnet by 3.1 Preparation for Posting Spam inferring their templates. Collecting Target Forums In order to post spam, XRumer recommends spammers to use Bulletin Board XRumer needs target forums. A spammer can either pro- Code (BBCode) in composing spam messages, par- vide URLs of target forums or can use Hrefer, a free ticularly for embedded links. BBCode is a simple supplement to buyers of XRumer. Hrefer uses search markup language used to format posts in many forums. engines to find forums based on a specific list of key- For example, using [URL]...[/URL] transforms the words provided to it as input. A spammer can directly inserted URL into the corresponding anchor tag, <a provide the entire keyword list, or provide some initial href=“...”>...</a>, in the forum’s HTML rendering.