Detecting Malicious Web Links and Identifying Their Attack Types

Total Page:16

File Type:pdf, Size:1020Kb

Detecting Malicious Web Links and Identifying Their Attack Types Detecting Malicious Web Links and Identifying Their Attack Types Hyunsang Choi Bin B. Zhu Heejo Lee Korea University Microsoft Research Asia Korea University Seoul, Korea Beijing, China Seoul, Korea [email protected] [email protected] [email protected] Abstract particularly human feedbacks that are highly accurate yet time-consuming. Blacklisting incurs no false positives, Malicious URLs have been widely used to mount various yet is effective only for known malicious URLs. It can- cyber attacks including spamming, phishing and mal- not detect unknown malicious URLs. The very nature of ware. Detection of malicious URLs and identification of exact match in blacklisting renders it easy to be evaded. threat types are critical to thwart these attacks. Know- ing the type of a threat enables estimation of severity This weakness of blacklisting has been addressed by of the attack and helps adopt an effective countermea- anomaly-based detection methods designed to detect un- sure. Existing methods typically detect malicious URLs known malicious URLs. In these methods, a classifica- of a single attack type. In this paper, we propose method tion model based on discriminative rules or features is using machine learning to detect malicious URLs of all built with either knowledge a priori or through machine the popular attack types and identify the nature of at- learning. Selection of discriminative rules or features tack a malicious URL attempts to launch. Our method plays a critical role for the performance of a detector. uses a variety of discriminative features including tex- A main research effort in malicious URL detection has tual properties, link structures, webpage contents, DNS focused on selecting highly effective discriminative fea- information, and network traffic. Many of these fea- tures. Existing methods were designed to detect mali- tures are novel and highly effective. Our experimental cious URLs of a single attack type, such as spamming, studies with 40,000 benign URLs and 32,000 malicious phishing, or malware. URLs obtained from real-life Internet sources show that In this paper, we propose a method using machine our method delivers a superior performance: the accu- learning to detect malicious URLs of all the popular at- racy was over 98% in detecting malicious URLs and over tack types including phishing, spamming and malware 93% in identifying attack types. We also report our stud- infection, and identify the attack types malicious URLs ies on the effectiveness of each group of discriminative attempt to launch. We have adopted a large set of dis- features, and discuss their evadability. criminative features related to textual patterns, link struc- tures, content composition, DNS information, and net- work traffic. Many of these features are novel and highly 1 Introduction effective. As described later in our experimental stud- ies, link popularity and certain lexical and DNS features While the World Wide Web has become a killer applica- are highly discriminative in not only detecting malicious tion on the Internet, it has also brought in an immense URLs but also identifying attack types. In addition, our risk of cyber attacks. Adversaries have used the Web as method is robust against known evasion techniques such a vehicle to deliver malicious attacks such as phishing, as redirection [42], link manipulation [16], and fast-flux spamming, and malware infection. For example, phish- hosting [17]. ing typically involves sending an email seemingly from Identification of attack types is useful since the knowl- a trustworthy source to trick people to click a URL (Uni- edge of the nature of a potential threat allows us to form Resource Locator) contained in the email that links take a proper reaction as well as a pertinent and effec- to a counterfeit webpage. tive countermeasure against the threat. For example, To address Web-based attacks, a great effort has been we may conveniently ignore spamming but should re- directed towards detection of malicious URLs. A com- spond immediately to malware infection. Our exper- mon countermeasure is to use a blacklist of malicious iments on 40,000 benign URLs and 32,000 malicious URLs, which can be constructed from various sources, URLs obtained from real-life Internet sources show that This work was done when Hyunsang Choi was an intern at Mi- our method has achieved an accuracy rate of more than crosoft Research Asia. Contact author: Bin B. Zhu ([email protected]). 98% in detecting malicious URLs and an accuracy rate 1 of more than 93% in identifying attack types. machine learning methods. The first task is a binary clas- This paper has the following major contributions: sification problem. The Support Vector Machine (SVM) is used to detect malicious URLs. The second task is a • We propose several groups of novel, highly discrim- multi-label classification problem. Two multi-label clas- inative features that enable our method to deliver sification methods, (RAkEL [38] and ML-kNN [48]), are a superior performance and capability on both de- used to identify attack types. tection and threat-type identification of malicious Task1: Support Vector Machine (SVM). SVM is URLs of main attack types including spamming, a widely used machine learning method introduced by phishing, and malware infection. Our method pro- Vapnik et al. [8]. SVM constructs hyperplanes in a high vides a much larger coverage than existing methods or infinite dimensional space for classification. Based while maintaining a high accuracy. on the Structural Risk Maximization theory, SVM finds the hyperplane that has the largest distance to the nearest • To the best of our knowledge, this is the first study training data points of any class, called functional mar- on classifying multiple types of malicious URLs, gin. Functional margin optimization can be achieved by known as a multi-label classification problem, in a maximizing the following equation systematic way. Multi-label classification is much harder than binary detection of malicious URLs since multi-label learning has to deal with the am- Xn 1 Xn α − α α y y K(x ; x ) biguity that an entity may belong to several classes. i 2 i j i j i j i=1 i;j=1 The remainder of this paper is organized as follows. We present the proposed method and the learning algo- subject to rithms it uses in Section 2, and describe the discrimina- tive features our method uses in Section 3. Evaluation of our method with real-life data is reported in Section 4. Xn ≤ ≤ We review related work in Section 5, and conclude the αiyi = 0, 0 αi C, i = 1; 2; :::; n paper in Section 6. i=1 where α and α are coefficients assigned to training 2 Our Framework i j samples xi and xj. K(xi; xj) is a kernel function used to measure similarity between the two samples. After 2.1 Overview specifying the kernel function, SVM computes the co- efficients which maximize the margin of correct classi- Our method consists of three stages as shown in Fig- fication on the training set. C is a regulation parameter ure 1: training data collection, supervised learning with used for tradeoff between training error and margin, and the training data, and malicious URL detection and at- training accuracy and model complexity. tack type identification. These stages can operate se- quentially as in batch learning, or in an interleaving man- Task2: RAkEL. and ML-kNN. RAkEL is a high- ner: additional data is collected to incrementally train the performance multi-label learning method that accepts classification models while the models are used in de- any multi-label learner as a parameter. RAkEL creates tection and identification. Interleaving operations enable m random sets of k label combinations, and builds an our method to adapt and improve continuously with new ensemble of Label Powerset (LP) [47] classifiers from data, especially with online learning where the output of each of the random sets. LP is a transformation-based our method is subsequently labeled and used to train the algorithm that accepts a single-label classifier as a pa- classification models. rameter. It considers each distinct combination of labels that exists in the training set as a different class value of a single-label classification task. Ranking of the la- 1. Data Collection 2. Supervised Learning bels is produced by averaging the zero-one predictions Input: URL 3-1. Detection 3-2. Identification of each model per considered label. An ensemble voting process under a threshold t is then employed to make a Output: Benign URL Malicious URL, {Type} decision for the final classification set. We use C4.5 [32] as the single-label classifier and LP as a parameter of the multi-label learner. Figure 1: The framework of our method. ML-kNN is derived from the traditional k-Nearest Neighbor (kNN) algorithm [1]. For each unseen in- stance, its k nearest neighbors in the training set are first 2.2 Learning Algorithms identified. Based on the statistical information gained from the label sets of these neighboring instances, max- The two tasks performed by our method, detecting mali- imum a posteriori principle is then utilized to determine cious URLs and identifying attack types, need different the label set for the unseen instance. 2 3 Discriminative Features at other positions. Therefore, we discard the widely used “bag-of-words” approach and adopt several new features Our method uses the same set of discriminative features differentiating SLDs from other positions, resulting in for both tasks: malicious URL detection and attack type a higher robustness against lexical manipulations by at- identification. These features can be classified into six tackers. Lexical features No. 1 to No. 4 in Table 1 are groups: lexicon, link popularity, webpage content, DNS, from previous work. Feature No. 10 is different from DNS fluxiness, and network traffic.
Recommended publications
  • The Internet and Drug Markets
    INSIGHTS EN ISSN THE INTERNET AND DRUG MARKETS 2314-9264 The internet and drug markets 21 The internet and drug markets EMCDDA project group Jane Mounteney, Alessandra Bo and Alberto Oteo 21 Legal notice This publication of the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA) is protected by copyright. The EMCDDA accepts no responsibility or liability for any consequences arising from the use of the data contained in this document. The contents of this publication do not necessarily reflect the official opinions of the EMCDDA’s partners, any EU Member State or any agency or institution of the European Union. Europe Direct is a service to help you find answers to your questions about the European Union Freephone number (*): 00 800 6 7 8 9 10 11 (*) The information given is free, as are most calls (though some operators, phone boxes or hotels may charge you). More information on the European Union is available on the internet (http://europa.eu). Luxembourg: Publications Office of the European Union, 2016 ISBN: 978-92-9168-841-8 doi:10.2810/324608 © European Monitoring Centre for Drugs and Drug Addiction, 2016 Reproduction is authorised provided the source is acknowledged. This publication should be referenced as: European Monitoring Centre for Drugs and Drug Addiction (2016), The internet and drug markets, EMCDDA Insights 21, Publications Office of the European Union, Luxembourg. References to chapters in this publication should include, where relevant, references to the authors of each chapter, together with a reference to the wider publication. For example: Mounteney, J., Oteo, A. and Griffiths, P.
    [Show full text]
  • Fully Automatic Link Spam Detection∗ Work in Progress
    SpamRank – Fully Automatic Link Spam Detection∗ Work in progress András A. Benczúr1,2 Károly Csalogány1,2 Tamás Sarlós1,2 Máté Uher1 1 Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) 11 Lagymanyosi u., H–1111 Budapest, Hungary 2 Eötvös University, Budapest {benczur, cskaresz, stamas, umate}@ilab.sztaki.hu www.ilab.sztaki.hu/websearch Abstract Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists or other means of human intervention. We assume that spammed pages have a biased distribution of pages that contribute to the undeserved high PageRank value. We define SpamRank by penalizing pages that originate a suspicious PageRank share and personalizing PageRank on the penalties. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page stratified random sample with bias towards large PageRank values. 1 Introduction Identifying and preventing spam was cited as one of the top challenges in web search engines in a 2002 paper [24]. Amit Singhal, principal scientist of Google Inc. estimated that the search engine spam industry had a revenue potential of $4.5 billion in year 2004 if they had been able to completely fool all search engines on all commercially viable queries [36]. Due to the large and ever increasing financial gains resulting from high search engine ratings, it is no wonder that a significant amount of human and machine resources are devoted to artificially inflating the rankings of certain web pages.
    [Show full text]
  • Clique-Attacks Detection in Web Search Engine for Spamdexing Using K-Clique Percolation Technique
    International Journal of Machine Learning and Computing, Vol. 2, No. 5, October 2012 Clique-Attacks Detection in Web Search Engine for Spamdexing using K-Clique Percolation Technique S. K. Jayanthi and S. Sasikala, Member, IACSIT Clique cluster groups the set of nodes that are completely Abstract—Search engines make the information retrieval connected to each other. Specifically if connections are added task easier for the users. Highly ranking position in the search between objects in the order of their distance from one engine query results brings great benefits for websites. Some another a cluster if formed when the objects forms a clique. If website owners interpret the link architecture to improve ranks. a web site is considered as a clique, then incoming and To handle the search engine spam problems, especially link farm spam, clique identification in the network structure would outgoing links analysis reveals the cliques existence in web. help a lot. This paper proposes a novel strategy to detect the It means strong interconnection between few websites with spam based on K-Clique Percolation method. Data collected mutual link interchange. It improves all websites rank, which from website and classified with NaiveBayes Classification participates in the clique cluster. In Fig. 2 one particular case algorithm. The suspicious spam sites are analyzed for of link spam, link farm spam is portrayed. That figure points clique-attacks. Observations and findings were given regarding one particular node (website) is pointed by so many nodes the spam. Performance of the system seems to be good in terms of accuracy. (websites), this structure gives higher rank for that website as per the PageRank algorithm.
    [Show full text]
  • Download PDF Document, 456 KB
    ENISA Position Paper No. 2 Reputation-based Systems: a security analysis Editors: Elisabetta Carrara and Giles Hogben, ENISA October 2007 Reputation-based Systems ENISA Position Papers represent expert opinion on topics ENISA considers to be important emerging risks or key security components. They are produced as the result of discussion among a group of experts who were selected for their knowledge in the area. The content was collected via wiki, mailing list and telephone conferences and edited by ENISA. This paper aims to provide a useful introduction to security issues affecting Reputation-based Systems by identifying a number of possible threats and attacks, highlighting the security requirements that should be fulfilled by these systems and providing recommendations for action and best practices to reduce the security risks to users. Examples are given from a number of providers throughout the paper. These should be taken as examples only and there is no intention to single out a specific provider for criticism or praise. The examples provided are not necessarily those most representative or important, nor is the aim of this paper to conduct any kind of market survey, as there might be other providers which are not mentioned here and nonetheless are equally or more representative of the market. Audience This paper is aimed at providers, designers, research and standardisation communities, government policy-makers and businesses. ENISA Position Paper No.2 Reputation-based Systems: a security analysis 1 Reputation-based Systems EXECUTIVE
    [Show full text]
  • The History of Digital Spam
    The History of Digital Spam Emilio Ferrara University of Southern California Information Sciences Institute Marina Del Rey, CA [email protected] ACM Reference Format: This broad definition will allow me to track, in an inclusive Emilio Ferrara. 2019. The History of Digital Spam. In Communications of manner, the evolution of digital spam across its most popular appli- the ACM, August 2019, Vol. 62 No. 8, Pages 82-91. ACM, New York, NY, USA, cations, starting from spam emails to modern-days spam. For each 9 pages. https://doi.org/10.1145/3299768 highlighted application domain, I will dive deep to understand the nuances of different digital spam strategies, including their intents Spam!: that’s what Lorrie Faith Cranor and Brian LaMacchia ex- and catalysts and, from a technical standpoint, how they are carried claimed in the title of a popular call-to-action article that appeared out and how they can be detected. twenty years ago on Communications of the ACM [10]. And yet, Wikipedia provides an extensive list of domains of application: despite the tremendous efforts of the research community over the last two decades to mitigate this problem, the sense of urgency ``While the most widely recognized form of spam is email spam, the term is applied to similar abuses in other media: instant remains unchanged, as emerging technologies have brought new messaging spam, Usenet newsgroup spam, Web search engine spam, dangerous forms of digital spam under the spotlight. Furthermore, spam in blogs, wiki spam, online classified ads spam, mobile when spam is carried out with the intent to deceive or influence phone messaging spam, Internet forum spam, junk fax at scale, it can alter the very fabric of society and our behavior.
    [Show full text]
  • Information Retrieval and Web Search Engines
    Information Retrieval and Web Search Engines Lecture 13: Miscellaneous July 23th, 2020 Wolf-Tilo Balke and Janus Wawrzinek Institut für Informationssysteme Technische Universität Braunschweig Lecture 13: Miscellaneous 1. Spamdexing 2. Hardware for Large Scale Web Search 3. Metasearch 4. Privacy Issues Information Retrieval and Web Search Engines — Wolf-Tilo Balke and José Pinto — Technische Universität Braunschweig 2 Spamdexing • Spamdexing = The practice of modifying the Web to get certain Web resources unjustifiably ranked high on search engine result lists • Often a synonym of SEO (“search engine optimization”) Information Retrieval and Web Search Engines — Wolf-Tilo Balke and José Pinto — Technische Universität Braunschweig 3 Spamdexing (2) • Spamdexing usually means finding weaknesses in ranking algorithms and exploiting them • Usually, it looks like this: Finds a new loophole Spammer Search Engine Fills the loophole • There are two classes of spamdexing techniques: – Content spam: Alter a page’s contents – Link spam: Alter the link structure between pages Information Retrieval and Web Search Engines — Wolf-Tilo Balke and José Pinto — Technische Universität Braunschweig 4 Content Spam Idea: – Exploit TF–IDF Method: – Repeatedly place the keywords to be found in the text, title, or URI of your page – Place the keywords in anchor texts of pages linking to your page – Weave your content into high-quality content taken from (possibly a lot of) other pages Countermeasures: – Train classification algorithms to detect patterns that are “typical”
    [Show full text]
  • Web Spam Taxonomy
    Web Spam Taxonomy Zolt´an Gy¨ongyi Hector Garcia-Molina Computer Science Department Computer Science Department Stanford University Stanford University [email protected] [email protected] Abstract techniques, but as far as we know, they still lack a fully effective set of tools for combating it. We believe Web spamming refers to actions intended to mislead that the first step in combating spam is understanding search engines into ranking some pages higher than it, that is, analyzing the techniques the spammers use they deserve. Recently, the amount of web spam has in- to mislead search engines. A proper understanding of creased dramatically, leading to a degradation of search spamming can then guide the development of appro- results. This paper presents a comprehensive taxon- priate countermeasures. omy of current spamming techniques, which we believe To that end, in this paper we organize web spam- can help in developing appropriate countermeasures. ming techniques into a taxonomy that can provide a framework for combating spam. We also provide an overview of published statistics about web spam to un- 1 Introduction derline the magnitude of the problem. There have been brief discussions of spam in the sci- As more and more people rely on the wealth of informa- entific literature [3, 6, 12]. One can also find details for tion available online, increased exposure on the World several specific techniques on the Web itself (e.g., [11]). Wide Web may yield significant financial gains for in- Nevertheless, we believe that this paper offers the first dividuals or organizations. Most frequently, search en- comprehensive taxonomy of all important spamming gines are the entryways to the Web; that is why some techniques known to date.
    [Show full text]
  • The Domain Abuse Activity Reporting (DAAR) System
    The DAAR System The Domain Abuse Activity Reporting (DAAR) System David Piscitello, ICANN and Greg Aaron, iThreat Cyber Group Abstract Efforts to study domain name abuse are common today, but these often have one or more limitations. They may only use a statistically valid sampling of the domain name space or use limited sets of domain name abuse (reputation) data. They may concentrate only on a particular security threat, e.g., spam. Notably, few studies collect and retain data over a sufficiently long timeframe to provide opportunities for historical analysis. This paper describes our efforts to collect a large body of domain name registration data, and to complement these data with a large set of reputation data feeds that report and classify multiple security threats. We intend that the resulting data repository will serve as a platform for daily or historical analysis or reporting. Ideally, any findings or reporting derived from our system can be independently validated. We thus use publicly or commercially available data so that the reports or findings from any studies that use our system would be reproducible by any party who collects the same data sets and applies the same processing rules. The long- term objective for our project is that this system will serve the community by establishing a persistent, fact-based repository for ongoing collection, analysis, and reporting. Version 0.3, November 2017 1 The DAAR System Table of Contents INTRODUCTION AND BACKGROUND 3 PURPOSES OF THE DAAR PROJECT 4 DAAR OVERVIEW 4 DAAR COLLECTION SYSTEM 5 DAAR DATA COLLECTION 5 TOP-LEVEL DOMAIN ZONE DATA 5 DOMAIN NAME REGISTRATION DATA 7 DOMAIN REPUTATION DATA (ABUSE DATA) 8 DAAR REPORTING SYSTEM 8 SECURITY THREATS OBSERVED BY THE DAAR 9 DAAR THREAT DATA COMPILATION 12 REPUTATION DATA USED BY DAAR 13 SELECTION OF REPUTATION DATA 14 MULTIPLE REPUTATION DATA SOURCES 15 FALSE POSITIVE RATES 16 DOES DAAR CAPTURE ALL OF THE ABUSE? 16 DAAR REPORTING 18 ABUSE SCORING 19 ACCESS TO THE DAAR SYSTEM 20 CONCLUSION 20 ANNEX A.
    [Show full text]
  • Spam in Blogs and Social Media
    ȱȱȱȱ ȱ Pranam Kolari, Tim Finin Akshay Java, Anupam Joshi March 25, 2007 ȱ • Spam on the Internet –Variants – Social Media Spam • Reason behind Spam in Blogs • Detecting Spam Blogs • Trends and Issues • How can you help? • Conclusions Pranam Kolari is a UMBC PhD Tim Finin is a UMBC Professor student. His dissertation is on with over 30 years of experience spam blog detection, with tools in the applying AI to information developed in use both by academia and systems, intelligent interfaces and industry. He has active research interest robotics. Current interests include social in internal corporate blogs, the Semantic media, the Semantic Web and multi- Web and blog analytics. agent systems. Akshay Java is a UMBC PhD student. Anupam Joshi is a UMBC Pro- His dissertation is on identify- fessor with research interests in ing influence and opinions in the broad area of networked social media. His research interests computing and intelligent systems. He include blog analytics, information currently serves on the editorial board of retrieval, natural language processing the International Journal of the Semantic and the Semantic Web. Web and Information. Ƿ Ȭȱ • Early form seen around 1992 with MAKE MONEY FAST • 80-85% of all e-mail traffic is spam • In numbers 2005 - (June) 30 billion per day 2006 - (June) 55 billion per day 2006 - (December) 85 billion per day 2007 - (February) 90 billion per day Sources: IronPort, Wikipedia http://www.ironport.com/company/ironport_pr_2006-06-28.html ȱȱǵ • “Unsolicited usually commercial e-mail sent to a large
    [Show full text]
  • By Stephen Harrison MBA
    SEO and Traffic Guide By Stephen Harrison MBA www.gintsa.com Copyright © 2012 Stephen Harrison All rights reserved This publication is designed to provide accurate and authoritative information with regard to the subject matter covered. It is sold with the understanding that the author and the publisher are not engaged in rendering legal, intellectual property, accounting, or other professional advice. If legal advice or other professional assistance is required, the services of a competent professional should be sought. Stephen Harrison does not accept any responsibility for any liabilities resulting from the actions of any parties involved. TABLE OF CONTENTS 1. The importance of Search Engines 4 2. What is Search Engine Optimization 5 3. How Do Search Engines Work - Web Crawlers 6 4. Submitting Your Website to Search Engines 7 5. Use of Keywords In Page Titles 9 6. Keyword Density 10 7. Some other Keyword Research Tools 11 8. How to Choose Your Web Hosting Company 12 9. Web hosting Services and Domain names 13 10. Use Your Website as Storefront 14 11. Some essential features your web site must 15 have 12. Tips to Increase Ranking and Website Traffic 16 13. Tips to Get Repeated Web Traffic 17 14. The Importance of Referrer Logs 18 15. Tools to Monitor Your Website 19 More 21 – 23 1. The Importance of Search Engines It is the search engines that finally bring your website to the notice of the prospective customers. When a topic is typed for search, nearly instantly, the search engine will sift through the millions of pages it has indexed about and present you with ones that match your topic.
    [Show full text]
  • A Survey on Adversarial Information Retrieval on the Web
    A Survey on Adversarial Information Retrieval on the Web Saad Farooq CS Department FAST-NU Lahore [email protected] Abstract—This survey paper discusses different forms of get the user to divulge their personal information or financial malicious techniques that can affect how an information details. Such pages are also referred to as spam pages. retrieval model retrieves documents for a query and their remedies. In the end, we discuss about spam in user-generated content, including in blogs and social media. Keywords—Information Retrieval, Adversarial, SEO, Spam, Spammer, User-Generated Content. II. WEB SPAM I. INTRODUCTION Web spamming refers to the deliberate manipulation of The search engines that are available on the web are search engine indexes to increase the rank of a site. Web frequently used to deliver the contents to users according to spam is a very common problem in search engines, and has their information need. Users express their information need existed since the advent of search engines in the 90s. It in the form of a bag of words also called a query. The search decreases the quality of search results, as it wastes the time engine then analyzes the query and retrieves the documents, of users. Web spam is also referred to as spamdexing (a images, videos, etc. that best match the query. Generally, all combination of spam and indexing) when it is done for the search engines retrieve the URLs, also simply referred to as sole purpose of boosting the rank of the spam page. links, of contents. Although, a search engine may retrieve thousands of links against a query, yet users are only There are three main categories of Web Spam [1] [2].
    [Show full text]
  • Classification of Malicious Web Pages Through a J48 Decision Tree, Anaïve Bayes, a RBF Network and a Random Forest Classifier Forwebspam Detection
    UCLA UCLA Previously Published Works Title Classification of Malicious Web Pages through a J48 Decision Tree, aNaïve Bayes, a RBF Network and a Random Forest Classifier forWebSpam Detection Permalink https://escholarship.org/uc/item/3zf1k9gn Journal International Journal of u- and e- Service, Science and Technology, 10(4) ISSN 2005-4246 Authors Iqbal, Muhammad Abid, Malik Muneeb Waheed, Usman et al. Publication Date 2017-05-04 Data Availability The data associated with this publication are within the manuscript. Peer reviewed eScholarship.org Powered by the California Digital Library University of California See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/316787981 Classification of Malicious Web Pages through a J48 Decision Tree, a Naïve Bayes, a RBF Network and a Random Forest Classifier... Article · April 2017 DOI: 10.14257/ijunesst.2017.10.4.05 CITATIONS READS 0 67 5 authors, including: Muhammad Iqbal Malik Muneeb Abid Bahria University Karachi Campus International Islamic University, Islamabad 11 PUBLICATIONS 4 CITATIONS 19 PUBLICATIONS 7 CITATIONS SEE PROFILE SEE PROFILE Syed Hasnain Alam Kazmi Southwest Jiaotong University 21 PUBLICATIONS 7 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: neurosciences View project All content following this page was uploaded by Malik Muneeb Abid on 09 May 2017. The user has requested enhancement of the downloaded file. Classification of Malicious Web Pages through a J48 Decision Tree, a Naïve Bayes, a RBF Network and a Random Forest Classifier for WebSpam Detection Muhammad Iqbal, Malik Muneeb Abid, Usman Waheed and Syed Hasnain Alam Kazmi ISSN 2005-4246 Volume 10, Number 4, 2017 International Journal of u- and e- Service, Science and Technology Vol.
    [Show full text]