A Survey on Adversarial Information Retrieval on the Web Saad Farooq CS Department FAST-NU Lahore
[email protected] Abstract—This survey paper discusses different forms of get the user to divulge their personal information or financial malicious techniques that can affect how an information details. Such pages are also referred to as spam pages. retrieval model retrieves documents for a query and their remedies. In the end, we discuss about spam in user-generated content, including in blogs and social media. Keywords—Information Retrieval, Adversarial, SEO, Spam, Spammer, User-Generated Content. II. WEB SPAM I. INTRODUCTION Web spamming refers to the deliberate manipulation of The search engines that are available on the web are search engine indexes to increase the rank of a site. Web frequently used to deliver the contents to users according to spam is a very common problem in search engines, and has their information need. Users express their information need existed since the advent of search engines in the 90s. It in the form of a bag of words also called a query. The search decreases the quality of search results, as it wastes the time engine then analyzes the query and retrieves the documents, of users. Web spam is also referred to as spamdexing (a images, videos, etc. that best match the query. Generally, all combination of spam and indexing) when it is done for the search engines retrieve the URLs, also simply referred to as sole purpose of boosting the rank of the spam page. links, of contents. Although, a search engine may retrieve thousands of links against a query, yet users are only There are three main categories of Web Spam [1] [2].