<<

Talkback: Reclaiming the Blogsphere Elie Bursztein, Baptiste Gourdin, John Mitchell Stanford University & LSV-ENS Cachan

1 What is a ?

• A Blog ("Web log") is a site, usually maintained by an individual with • Regular entries • Commentary • • Entries displayed in reverse-chronological

order. http://elie.im/blog

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the from spammer http://ly.tl/p21 Key Statistics

• 184 Millions • 73% of users read blogs • 50% post comments

universalmccann

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Anatomy of a blog post

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Why blogs are special ?

User

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Why blogs are special ?

User

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 What is a ?

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Trackback Illustrated

Little Timmy said to me... "What's Trackback, Daddy?"

"Wow! Jimmy Lightning has written the best 1. post ever! It's so funny! And it's true! That's "Best Post Ever" why it's so good. I need to tell the world!"

"Check it out world! I've "Jimmy written all about Jimmy 2. Lightning is Lightning's post on my Elie Bursztein, Baptiste Gourdin, John Mitchell swell"TalkBack: reclaiming the blogosphere from spammerweblog. My weblog's http://ly.tl/p21 called 'The Unbloggable Blogness of Blogging'. It's a good name huh? Wow, I sure hope Jimmy sees what I said about him..."

"Hey! I wonder what happens when I click 3. on that Trackback link at the bottom of "Best Post Ever" Jimmy's post?"

"Far out! In that funny pop-up is a link back to Trackback: my site! Too cool for "Jimmy Lightning school!" is swell" Little Timmy said to me... "What's Trackback, Daddy?"

"Wow! Jimmy Lightning has written the best 1. post ever! It's so funny! And it's true! That's "Best Post Ever" why it's so good. I need to tell the world!" Trackback Illustrated

Little Timmy said to me... "What's Trackback, Daddy?"

"Wow! Jimmy Lightning "Check it out world! I've has written"Jimmy the best written all about Jimmy 2.1. postLightning ever! It's is so funny! Lightning's post on my "Best Post Ever" And it'sswell" true! That's weblog. My weblog's why it's so good. I called 'The need to tell the world!" Unbloggable Blogness of Blogging'. It's a good name huh? Wow, I sure hope Jimmy sees what I said about him..."

"Hey! I wonder what "Check it out world! I've happens"Jimmy when I click written all about Jimmy 3.2. on thatLightning Trackback is link Lightning's post on my "Best Post Ever" Elie Bursztein, Baptiste Gourdin, John Mitchell at the swell"bottomTalkBack: reclaiming of the blogosphere from spammerweblog. My weblog's http://ly.tl/p21 Jimmy's post?" called 'The Unbloggable Blogness "Far out! In that funny of Blogging'. It's a good pop-up is a link back to Trackback:name huh? Wow, I my site! Too cool for "Jimmysure hopeLightning Jimmy sees school!" iswhat swell" I said about him..."

"Hey! I wonder what happens when I click 3. on that Trackback link at the bottom of "Best Post Ever" Jimmy's post?"

"Far out! In that funny pop-up is a link back to Trackback: my site! Too cool for "Jimmy Lightning school!" is swell" Little Timmy said to me... "What's Trackback, Daddy?"

"Wow! Jimmy Lightning has written the best 1. post ever! It's so funny! And it's true! That's "Best Post Ever" why it's so good. I need to tell the world!"

Little Timmy said to me... "What's Trackback, Daddy?"

"Wow! Jimmy Lightning "Check it out world! I've has written"Jimmy the best written all about Jimmy 2.1. postLightning ever! It's is so funny! Lightning's post on my "Best Post Ever" And it'sswell" true! That's weblog. My weblog's why it's so good. I called 'The need to tell the world!" Unbloggable Blogness of Blogging'. It's a good name huh? Wow, I Trackback Illustratedsure hope Jimmy sees what I said about him..." Little Timmy said to me... "What's Trackback, Daddy?"

"Wow!"Hey! I Jimmywonder Lightning what "Check it out world! I've hashappens written"Jimmy when the bestI click written all about Jimmy 3.2.1. poston thatLightning ever! Trackback It's is so funny! link Lightning's post on my "Best"Best PostPost Ever"Ever" Andat the it's bottomswell" true! That's of weblog. My weblog's whyJimmy's it's so post?" good. I called 'The need to tell the world!" Unbloggable Blogness "Far out! In that funny of Blogging'. It's a good pop-up is a link back to Trackback:name huh? Wow, I my site! Too cool for "Jimmysure Lightninghope Jimmy sees school!" is swell"what I said about him..."

"Hey! I wonder what "Check it out world! I've happens"Jimmy when I click written all about Jimmy 3.2. on thatLightning Trackback is link Lightning's post on my "Best Post Ever" Elie Bursztein, Baptiste Gourdin, John Mitchell at the swell"bottomTalkBack: reclaiming of the blogosphere from spammerweblog. My weblog's http://ly.tl/p21 Jimmy's post?" called 'The Unbloggable Blogness "Far out! In that funny of Blogging'. It's a good pop-up is a link back to Trackback:name huh? Wow, I my site! Too cool for "Jimmysure hopeLightning Jimmy sees school!" iswhat swell" I said about him..."

"Hey! I wonder what happens when I click 3. on that Trackback link at the bottom of "Best Post Ever" Jimmy's post?"

"Far out! In that funny pop-up is a link back to Trackback: my site! Too cool for "Jimmy Lightning school!" is swell" Why the LinkBack problem is different ?

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Why the LinkBack problem is different ?

• A single spam can reach thousand of users

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Why the LinkBack problem is different ?

• A single spam can reach thousand of users • LinkBack notification are automated

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Toward a secure world ?

30 million LinkBack spam by day

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Spam Campaign example

Trackback Spams 100000

75000

50000 Number of Spams

25000

0

Jul 9, 2007 Mar 1, 2007 Apr 9, 2007 Jul 22, 2007Aug 4, 2007 Oct 8, 2007 Nov 3, 2007 Jan 7, 2008 Feb 2, 2008 Apr 7, 2008 Apr 22, May2007 5, 2007 Oct 21, 2007 Jan 20, 2008 Apr 20, 2008 Mar 14,Mar 2007 27, 2007 May 18,May 2007 31,Jun 2007 13,Jun 2007 26, 2007 Aug 17,Aug 2007 30,Sep 2007 12,Sep 2007 25, 2007 Nov 16,Nov 2007 29,Dec 2007 12,Dec 2007 25, 2007 Feb 15,Feb 2008 28,Mar 2008 12,Mar 2008 25, 2008

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Spam Campaign example

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 The end of all hope ?

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 The end of all hope ?

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 The end of all hope ?

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Adversary Model

• Resourceful • Efficient • Knowledgable • Adaptive

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Threat

• Blog spoofing • Cried Wolf attack • Linkback tampering and replay • Spam in breath or in deapth

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Current Linkback specifications

RefBack TrackBack TalkBack Visit from Code executed Code executed Code executed Trigger Mechanism the sender site at posting time at posting time at posting time Notification medium HTTP referer XML-RPC call HTTP POST HTTP POST - S post URL - S site name - S post URL - S post title - S post URL - S site name - S post excerpt Information sent none - R post URL - S post title -SeedToken - S post excerpt - S Public key - R Public key -Signature Auto-discovery Specially formatted none LINK LINK Tag mechanism information in the body S Authenticity - - - ￿ R Authenticity - - - ￿ Integrity - - - ￿ Confidentiality - - - ￿ Table 1: Linkback mechanisms comparaison

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21

2. The sender (S) crawls the receiver (R) blog as usual, discovers the notification URL (Sec 6) and in addition 5. Receiver Receiver validation Authority fetches the receiver (R) public key used to authenticate the receiver in the Linkback and to encrypt data (Sec

Sender 7). Authority 3. The sender (S) uses the seed fetched at step 1 along 1. Seed 4. Talkback with the notification URL and public key fetched at request reporting step 2 to build and send the secure LinkBack to the receiver (R). 2.Auto-Discovery Sender Receiver 4. The receiver (R) performs security verification on the 3. Talkback received LinkBack and eventually forwards it to the posting authority A to ensure that the LinkBack and sender S are still valid.

Figure 1: TalkBack protocol overview 5. If the receiver (R) authority (A￿) is different from the sender (S) authority (A), then the sender authority contacts the receiver authority to fetch and validate the receiver (R) identity. To improve performance, 4.2 Posting a LinkBack the receiver public key is cached. Note that there Once blogs are successfully registered with an authority, is a RESTful API in place that allows authorities to posting a LinkBack is achieved in at most four steps (Diagram communicate. For clarity and because communication 1). In the TalkBack protocol there are at three or four between authorities is straightforward, we assume for participants: the rest of the paper that authorities (A) and (A￿) are the same. 1. A: The sender authority, which is used to authenti- cate the sender and enforce rate limiting. As we will see in the sections 6 and 8, Talkback provides numerous additional features, such as caching and whitelist- 2. S : The sender, which is the blog that wants to send ing, designed to reduce the workload. Accordingly, in some the LinkBack notification cases posting a Linkback only requires performing the Step 3. 3. R: The receiver, which is the blog that receives and processes the Linkback notification. 4.3 Comparison with Other Mechanisms As shown in the table 1, TalkBack is the only LinkBack 4. A￿: The Receiver authority, which might be different from the sender authority and is used to authenticate mechanism that provides security features. None of the other the receiver. mechanisms provide a way to ensure sender and receiver authenticity, or LinkBack integrity and confidentiality. To The five steps used to send a Linkback are depicted in ensure that Talkback will be a widely adopted standard, it diagram 1. Here is what happens during these steps: has been design to be robust, lightweight, easy to implement and compatible with web standards. This is why we choose, 1. The sender (S) requests a seed from the authority. This like PingBack, to use the standard HTML tag to seed is used to prevent accumulation attacks, replay embedded the discovery mechanism and the blog public key attacks, and to enforce rate limiting (Sec 7). (Sec 6). This choice ensures that adding this information Threats are not addressed by current Linkback specifications

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Introducting Talkback !

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Use a light-weight PKI to fight Spam

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 TalkBack Benefits

RefBack PingBack TrackBack TalkBack Visit from Code executed Code executed Code executed Trigger Mechanism the sender site at posting time at posting time at posting time Notification medium HTTP referer XML-RPC call HTTP POST HTTP POST - S post URL - S site name - S post URL - S post title - S post URL - S site name - S post excerpt Information sent none - R post URL - S post title -SeedToken - S post excerpt - S Public key - R Public key -Signature Auto-discovery Specially formatted none LINK Tag LINK Tag mechanism information in the body S Authenticity - - - ￿ R Authenticity - - - ￿ Integrity - - - ￿ Confidentiality - - - ￿ Table 1: Linkback mechanisms comparaison

2. The sender (S) crawls the receiver (R) blog as usual, discovers the notification URL (Sec 6) and in addition 5. Receiver Receiver validation Authority fetches the receiver (R) public key used to authenticate Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogospherethe from receiver spammer in the Linkback and to encrypthttp://ly.tl/p21 data (Sec

Sender 7). Authority 3. The sender (S) uses the seed fetched at step 1 along 1. Seed 4. Talkback with the notification URL and public key fetched at request reporting step 2 to build and send the secure LinkBack to the receiver (R). 2.Auto-Discovery Sender Receiver 4. The receiver (R) performs security verification on the 3. Talkback received LinkBack and eventually forwards it to the posting authority A to ensure that the LinkBack and sender S are still valid.

Figure 1: TalkBack protocol overview 5. If the receiver (R) authority (A￿) is different from the sender (S) authority (A), then the sender authority contacts the receiver authority to fetch and validate the receiver (R) identity. To improve performance, 4.2 Posting a LinkBack the receiver public key is cached. Note that there Once blogs are successfully registered with an authority, is a RESTful API in place that allows authorities to posting a LinkBack is achieved in at most four steps (Diagram communicate. For clarity and because communication 1). In the TalkBack protocol there are at three or four between authorities is straightforward, we assume for participants: the rest of the paper that authorities (A) and (A￿) are the same. 1. A: The sender authority, which is used to authenti- cate the sender and enforce rate limiting. As we will see in the sections 6 and 8, Talkback provides numerous additional features, such as caching and whitelist- 2. S : The sender, which is the blog that wants to send ing, designed to reduce the workload. Accordingly, in some the LinkBack notification cases posting a Linkback only requires performing the Step 3. 3. R: The receiver, which is the blog that receives and processes the Linkback notification. 4.3 Comparison with Other Mechanisms As shown in the table 1, TalkBack is the only LinkBack 4. A￿: The Receiver authority, which might be different from the sender authority and is used to authenticate mechanism that provides security features. None of the other the receiver. mechanisms provide a way to ensure sender and receiver authenticity, or LinkBack integrity and confidentiality. To The five steps used to send a Linkback are depicted in ensure that Talkback will be a widely adopted standard, it diagram 1. Here is what happens during these steps: has been design to be robust, lightweight, easy to implement and compatible with web standards. This is why we choose, 1. The sender (S) requests a seed from the authority. This like PingBack, to use the standard HTML tag to seed is used to prevent accumulation attacks, replay embedded the discovery mechanism and the blog public key attacks, and to enforce rate limiting (Sec 7). (Sec 6). This choice ensures that adding this information Talkback Overview

5. Receiver Receiver validation Authority

Sender Authority

1. Seed 4. TalkBack request reporting

2.Auto-Discovery Sender Receiver 3. TalkBack posting

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Talkback Security Equation

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Talkback Security Equation

Limited number of blogs

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Talkback Security Equation

Limited number of blogs + Limited number of LinkBack by blog

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Talkback Security Equation

Limited number of blogs + Limited number of LinkBack by blog Spam under-control

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Making blog registration costly

• Captcha • Email verification • Domain verification

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 LinkBack benefits

• Blog • Authenticity • Whitelisting / blacklisting • Linkback • Integrity • Non-repudiable • Confidentiality (optional)

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Reducing authority power

• Can’t forge LinkBack: authorities have only blog public keys • Respect privacy: authorities don’t see LinkBack content in secure mode • No single point of failure/control: protocol allows blog to choose the authority of their choice.

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Implementation

• Open-source library • Wordpress plugin • Authority : https://talkback.stanford.edu

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Authority benchmark

3000 1200 2800 1100 2600 1000 2400

2200 900

2000 800 1800 700 1600 600 1400 500 1200

1000 400 Number of Talkback byseconds Talkback Numberof byseconds Talkback Numberof 800 300 600 200 400 100 200 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 0 5 10 15 20 25 30 35 Number of senders Number of senders

Elie Bursztein,Figure Baptiste Gourdin, 3: John Number Mitchell ofTalkBack: Talkbacks reclaiming the blogosphere processed from spammer by second http://ly.tl/p21Figure 4: Number of Talkbacks processed by second by Authority by the receiving blog

process per second. To make sure that the bottleneck was on authority will verify that all the links exist. the authority, we generated ahead of time 100 000 TalkBacks and used several senders/machines to send them at once to Reputation system. Using a reputation system alone for the authority. The senders are not actual blogs but custom TrackBack spam is ineffective because an attacker may change php scripts that use our library. As visible on figure 3,as the blog URL for every posts. Therefore any long-term clas- the number of senders increases the number of TalkBacks sification based on TrackBack is bound to fail because there processed increases until it reach a plateau around 2800 Talk- is no way to prevent spoofing (under the current TrackBack backs a second. This benchmark is properly evaluated on a specification). 24-hour basis, because blogs notification is spread relatively IP Blacklisting. While blacklisting based on IP might evenly across a 24-hours period [24], due to timezone vari- currently work as the spammers today seem to use only a ations and blogger habits. Accordingly the fact that our small number of IPs, it is not a sustainable solution because authority using a single frontend is able to process around in the long run, it is likely that spammers will use botnets 242 Million Talkbacks a day makes us confident that even and therefore have a huge pool of IPs. though TalkBack is designed to allow multiple authorities (Sec. 4), our authority alone with additional frontends will be Rate limiting. Rate limiting at the blog level is not effective able to sustain the entire blogosphere that currently consists because a blog does not have a global view of the situation of around 180 Million blogs [18]. and therefore cannot stop spammers that target a huge number of blogs and post only once to each of them with the We conducted a similar experiment to see how fast a same IP. receiving blog is able to process Talkback notifications. To Community filtering. It is possible to combine TalkBack make this test realistic, we used as a receiving blog Wordpress with a community filtering approach that lets blog readers 3.0 (the latest version) equipped with our plugin. As in decide which notifications are relevant either by voting or the previous test senders are custom scripts that send 1000 by analyzing the click-through rate. The problem with the talkback notifications as fast as possible. We also used a community approach is that, if a spam notifications is suffi- more standard hardware platform as the blog was hosted ciently deceptive, blog readers won’t be able to flag it unless on a 2.4ghz Intel quad core. As visible in figure 4 a single they visit the site pointed by it, which is exactly what we blog is able to process more than 1000 Talkbacks a second try to prevent. which is more than enough even for very high traffic blog. It is unlikely that a single blog will received more than 84 More Related Work. Previous studies of spam email re- millions notifications a day. port that around 120 billions spam emails are sent every day [10]. In [11] and [14], the authors study a spam campaign by 10. ADDITIONAL RELEVANT WORK infiltrating the Storm botnet, while [2] analyzes the revenue In this section we present relevant work to our approach. generated by Storm spam. Former spammers relate their experiences in [12] and [29]. Blogosphere evolution is consid- TrackBack Validator. The WordPress TrackBack Valida- ered in a number of studies, including [16, 26, 18]. A DOS tor [23] looks at the sender URL to validate that the post defense study [17] notes that ideas spread more quickly in contains the URL of the receiver. This approach increases the the blogosphere than by email. In previous work on linkback network load because each receiver will look at the sender’s spam, [20] examines ways that the language appearing in page. This load increase can be used to perform a DOS at- a blog can be used as a blocking defense. Similarly, [21] tack with amplification: the attacker spoofs a simple HTTP studies how the language of web pages, including blogs, can request and the receiver will fetch the entire page. With be used to detect spam. In [13] the authors use Support our central authority this problem does not exist: only the Vector Machines (SVM) to classify blog spam. Blog benchmark (Wordpress 3.1)

3000 1200 2800 1100 2600 1000 2400

2200 900

2000 800 1800 700 1600 600 1400 500 1200

1000 400 Number of Talkback byseconds Talkback Numberof byseconds Talkback Numberof 800 300 600 200 400 100 200 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 0 5 10 15 20 25 30 35 Number of senders Number of senders

Figure 3: Number of Talkbacks processed by second Elie Bursztein,Figure Baptiste Gourdin, 4: John Number Mitchell ofTalkBack: Talkbacks reclaiming the blogosphere processed from spammer by second http://ly.tl/p21 by Authority by the receiving blog

process per second. To make sure that the bottleneck was on authority will verify that all the links exist. the authority, we generated ahead of time 100 000 TalkBacks and used several senders/machines to send them at once to Reputation system. Using a reputation system alone for the authority. The senders are not actual blogs but custom TrackBack spam is ineffective because an attacker may change php scripts that use our library. As visible on figure 3,as the blog URL for every posts. Therefore any long-term clas- the number of senders increases the number of TalkBacks sification based on TrackBack is bound to fail because there processed increases until it reach a plateau around 2800 Talk- is no way to prevent spoofing (under the current TrackBack backs a second. This benchmark is properly evaluated on a specification). 24-hour basis, because blogs notification is spread relatively IP Blacklisting. While blacklisting based on IP might evenly across a 24-hours period [24], due to timezone vari- currently work as the spammers today seem to use only a ations and blogger habits. Accordingly the fact that our small number of IPs, it is not a sustainable solution because authority using a single frontend is able to process around in the long run, it is likely that spammers will use botnets 242 Million Talkbacks a day makes us confident that even and therefore have a huge pool of IPs. though TalkBack is designed to allow multiple authorities (Sec. 4), our authority alone with additional frontends will be Rate limiting. Rate limiting at the blog level is not effective able to sustain the entire blogosphere that currently consists because a blog does not have a global view of the situation of around 180 Million blogs [18]. and therefore cannot stop spammers that target a huge number of blogs and post only once to each of them with the We conducted a similar experiment to see how fast a same IP. receiving blog is able to process Talkback notifications. To Community filtering. It is possible to combine TalkBack make this test realistic, we used as a receiving blog Wordpress with a community filtering approach that lets blog readers 3.0 (the latest version) equipped with our plugin. As in decide which notifications are relevant either by voting or the previous test senders are custom scripts that send 1000 by analyzing the click-through rate. The problem with the talkback notifications as fast as possible. We also used a community approach is that, if a spam notifications is suffi- more standard hardware platform as the blog was hosted ciently deceptive, blog readers won’t be able to flag it unless on a 2.4ghz Intel quad core. As visible in figure 4 a single they visit the site pointed by it, which is exactly what we blog is able to process more than 1000 Talkbacks a second try to prevent. which is more than enough even for very high traffic blog. It is unlikely that a single blog will received more than 84 More Related Work. Previous studies of spam email re- millions notifications a day. port that around 120 billions spam emails are sent every day [10]. In [11] and [14], the authors study a spam campaign by 10. ADDITIONAL RELEVANT WORK infiltrating the Storm botnet, while [2] analyzes the revenue In this section we present relevant work to our approach. generated by Storm spam. Former spammers relate their experiences in [12] and [29]. Blogosphere evolution is consid- TrackBack Validator. The WordPress TrackBack Valida- ered in a number of studies, including [16, 26, 18]. A DOS tor [23] looks at the sender URL to validate that the post defense study [17] notes that ideas spread more quickly in contains the URL of the receiver. This approach increases the the blogosphere than by email. In previous work on linkback network load because each receiver will look at the sender’s spam, [20] examines ways that the language appearing in page. This load increase can be used to perform a DOS at- a blog can be used as a blocking defense. Similarly, [21] tack with amplification: the attacker spoofs a simple HTTP studies how the language of web pages, including blogs, can request and the receiver will fetch the entire page. With be used to detect spam. In [13] the authors use Support our central authority this problem does not exist: only the Vector Machines (SVM) to classify blog spam. Thank you !

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Questions ?

Talkback Follow-us on Twitter http://ly.tl/talkback @elie, @bapt1ste

Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21