Journal of Information and Computational Science ISSN: 1548-7741

A Heuristic Model to Detect Malicious using Case Based Reasoning

Dr. Sarika S Assistant professor Naipunnya Institute of Management and Information Technology,Koratty,Trissur [email protected]

Abstract is the fraudulent practice of deceiving users by hijacking sensitive information like details of bank account, , login on and social networking sites, through social engineering. It can occur in varied forms like using malicious URLs and links, attacks based on etc. The most devious form of attack occurs when the user see an innocuous page with same look and feel of a genuine webpage but with a phishing URL. Malicious URLs host unsolicited content and attract users to become victims of and financial losses. Effective systems to detect such malicious URLs in a timely manner are necessary to counter a variety of web security threats. This paper presents a heuristic method to analyze the pattern of phishing URLs and check whether they are malicious or not. The method leverages the use of multi agent system with case based reasoning to detect the attack. The experimental results show that the proposed method is capable of achieving good true positive rate with an accuracy of 98%.

Keywords: Antiphishing, multi agent system, URL analysis, case based reasoning

1. Introduction With the advent of for business, finance and personal investments, threats due to internet are on rise. They conduct counterfeit transactions to deceive users and steal personal information from them. The most venomous form of internet is phishing. Phishing is a form of online identity theft that aims to steal sensitive information such as usernames, passwords, and credit card details by masquerading as a trustworthy entity in an electronic communication. Nowadays, identity theft is one of the most profitable committed by fraudsters and criminals are exploiting best possible resources to carry out their tasks. A recent report highlighted an increase in the number of identity thefts and the phishing loss is estimated around billions for the affected organizations. A good example is WannaCry attack [1], a worldwide cyber attack which infected more than 230,000 computers running Microsoft Windows operating system around 150 countries during February 2017. As phishing sites are a cornerstone of internet criminal activities, there has been broad interest in developing systems to prevent the end user from visiting such sites. However, the effectiveness of such systems is questionable as the phishers change their tactics and materialize with new threats.

Volume 9 Issue 11 - 2019 1066 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

Most methods of phishing use Spoofed URL, which is a common trait for phishing scams; pose a serious threat to end-users and commercial institutions. URL obfuscation is a form of technical deception wherein the victims are made to think that a link or web site displayed in their web browser or email client is that of a trusted site when it is not. As users are aware of phishing and can detect fake emails and web sites, the attackers design webpages with convincing content as baits to steal user’s personal information. URL obfuscation allows a hostile web site to exploit vulnerabilities in web browsers that allow the download and execution of malicious codes. These methods tend to be technically simple yet highly effective, and are still used to perform deception. Thus, identifying phishing URLs has become a necessity and challenge also, in the context of online security. The main contribution of this paper is to monitor and detect phishing sites which masquerade as benevolent ones using multi agent systems with case based reasoning. The agents in this system can learn and detect phishing attacks based on the selected features. The technique is adaptive and dynamic to new cases. The method runs with a few set of features and detects fraudulent URLs.

2. Literature Review Most browsers now use URL verification method to protect users from phishing attacks. This is because URL filtering magnitudes faster than typical web page classification, as the entire webpage does not have to be fetched and analyzed. The approach is simple and effective as it produces less average error rate. Moreover, the average response time is less as it considers only the URL of a webpage. Some of the URL filtering techniques are described below. A simple filtering algorithm called SPS [2] that protects clients from phishing attacks by removing part of the malicious content that traps clients into entering personal information. This approach analyzes the behaviors of novice users and of phishers to formulate requirements for SPS. The approach uses a two-level filtering algorithm which is composed of Strict URL Filtering (SUF) and HTTP Response Sanitizing (HRS). The URL filtering component uses a rule set which analyses the URLs on HTML documents and categorize them as safe or suspicious. If a URL is suspicious, the HTTP response sanitizing component removes any input forms from the HTML documents and alert the users about the malicious parts by a sanitized web page. The method is convenient as it does not prohibit the user from browsing any webpages, except blocking them to disclose their personal information to unknown URLs. However, SPS is vulnerable to pharming attacks. According to [3], phishing attacks can be identified through target domain identification. In this work, an algorithm is implemented to identify the phishing target which masquerades as genuine. The approach also groups the domains from hyperlinks having direct or indirect association with a suspected webpage. In order to obtain a target domain set, the domains gathered from the directly associated webpages are cross checked with the domains gathered from the indirectly associated webpages. After applying Target Identification (TID) algorithm on this set, the matched domains are cancelled. The resultant set is compared with a DNS server to identify the legitimacy of a suspicious page. PageSafe [4] is an antiphishing tool that prevents accesses to phishing sites through URL validation and also detects DNS poisoning attacks. PageSafe

Volume 9 Issue 11 - 2019 1067 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

examines the anomalies in web pages and uses a machine learning approach for automatic classification. As the method does not preserve any secret information, it requires very less input from user. PageSafe performs automatic classification by incorporating user assistance so that the number of false positives is reduced by a significant value. The approach also maintains a whitelist, a list of domains with mapping to corresponding IP addresses. This list is referenced first for resolving IP of a URL to protect user from DNS poisoning attacks. Whitelist is encrypted by a master password. By using PageSafe, users are able to decide whether or not a web page is legitimate. The method proposed by Ma et al.[5] described how to identify suspicious URLs. The method has detected malicious using both lexical and host based features of URLs in a balanced set. They have compared the accuracy of batch Ma et al.[6] and online learning algorithms Ma et al.[7] using these features. Six lexical and host based features are considered from a URL without including any content from the webpage. They have used various online algorithms such as Perceptron, Logistic Regression with Stochastic Gradient Descent, Passive Aggressive(PA) and Confidence Weighted(CW) algorithms to compare with a batch processing algorithm, and proved that online learning algorithms work better than batch learning for detecting malicious websites. Among the classifiers, they have seen that Confidence-Weighted (CW) Algorithm offers best accuracy than others which is up to 99% over a balanced data set. A hybrid model [8] to detect phishing sites using K-Means Clustering [16] and Naive Bayes Classifier [22] have considered URL features and HTML features of a site to label it as phishing or legitimate. The K-Means Clustering is applied on the URL features of the web site by defining three clusters and a feature set is plotted in one of the clusters of database to check the validity of the site. If the result of k-means clustering is not enough to determine the site’s legitimacy, the method further extracts HTML features of the webpage by using DOM representation. A combination of both URL and HTML features of the webpage are considered to create the feature set and is applied to a Naive Bayes Classifier to determine the probability of the as phishing or not. The work of Huang et al.[9] have demonstrated a method to find phishing URLs using a supervised learning approach using SVM classifier. To model the SVM, they have considered 23 features to construct the feature vector. The different features include 4 structural features, 9 lexical features and 10 brand name features. If the classification output is -1, the URL is phishing otherwise it is labeled as genuine for output 1. A semisupervised approach for identifying phishing URL in a realistic scenario has been proposed by Gyawali et al. [10]. This method is focused on reducing the cost of training a supervised algorithm by relying on fewer manually labeled data. In order to reduce manually labeled data, a semisupervised approach is applied and trained the learning algorithm using a collection of manually labeled and pseudo labeled data. The approach could detect more phishing URLs comparative to a fully supervised approach, using only 10% manually labeled data. To show the usefulness of the URL alone in performing web page classification, Kan and Thi have proposed a technique for webpage classification using URL [11]. The approach segments the URL into meaningful tokens. These tokens are given as input into an analysis module that derives useful composite features for classification. The resultant binary features are used in supervised maximum entropy modeling. These features model sequential dependencies

Volume 9 Issue 11 - 2019 1068 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

between tokens, their orthographic patterns, length, and originating URL component. To analyze the effectiveness of the method, machine learning techniques are used to induce a multiclass or regression model from labeled training URLs. Maximum Entropy (ME) modeling [12] has been used here for binary, multi-class and hierarchical classification. After training, new URLs can be fed to the classifier as test values. A method to detect phishing site by using the heuristics in URL and page rank of websites have been proposed by Nguyen et al., [13]. They considered six different heuristics derived from URL and webpage ranking and computed a metric for each component by finding the value and weight of heuristics. The combined metric value is compared with a threshold value to decide whether a website is a phishing or genuine. In [14], phishing URLs are detected using text feature extraction. Initially, the webpage is parsed to get plain text terms, URL and domain name from respective web page. Here HTML parser is used to create a Document Object Model (DOM). The extracted terms are fed to TF-IDF algorithm to identify the importance of each detected term and its weight is calculated. Moreover, the weight of URLs is also calculated according to its frequency in the document of the source code. Further, a search engine lookup is carried out to recognize most important terms which help to detect possible victim URLs for given input website. Then, WHOIS lookup is used to compare registration details of websites to categorize website as phishing or legitimate. Another work proposed in [15] focused on classification of URLs based on lexical, keyword based, reputation based and search engine based features. Using these, a feature vector is created for each URL with 138 dimensions. The extracted features are fed to seven different classifiers for empirical evaluation. The classifiers used are 1) SVMs with rbf kernel 2) SVMs with linear kernel 3) Multilayer Perceptron (MLP) 4) Random Forest (RF) 5) Naïve Bayes (NB) 6) Logistic Regression (LR) and 7) C4.5 which is implemented as J48 in WEKA. The experimental results showed that RF outperformed all other classifiers in terms of the classification performance. The above specified supervised learning method showed that the method was able to detect phishing URLs with an accuracy of more than 99.4% with false positive and false negative rates of less than 0.5%. This paper is an extension of the previous papers [17][18][19][20]. The proposed method is able to resist URL phishing attacks with few resources in hand and provides user intervention while implementing the technique where other methods fail. It uses three agents to monitor the fraudulent URLs and alerts the user during the attack wherein he can act accordingly.

3. Multi Agent Systems Agent Technology can be widely used to help users to achieve various tasks in diverse domain such as network management, data mining, database systems, air-traffic control, telecommunication, and electronic commerce [25]. In the era of endless information flows, the idea of having a software agent that can perform complex tasks on our behalf is intuitively appealing. Software agents are the powerful tool for implementing next-generation information systems. Software agents are suitable for use in a wide variety of applications. They can make it much easier to build many kinds of multifaceted systems. However, for

Volume 9 Issue 11 - 2019 1069 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

building systems that exhibit complex behaviors, a single agent is not sufficient. As the technology matures and addresses sophisticated problems, the need for systems that consist of multiple agents become apparent. The MAS approach [24] seems to be the most feasible solution for such scenarios. Multi agent systems differ from single-agent systems in that several agents exist which model each other’s goals and actions. Multi agent systems divide the problem into modules which operate asynchronously. This decomposition allows agent to use the most appropriate technique for solving the problems. Interdependent problems are solved with coordinated effort from multiple agents. These multiple agents are autonomous and heterogeneous in nature.

4. Case-Based Reasoning Case-based reasoning (CBR) is a paradigm to solve problems by utilizing the solutions of previously solved similar problems [28]. Every previous problem is considered as a case. A case is a similar set of related facts or information and it contains a description of the problem, a solution and an outcome. A case base comprises a collection of these cases. In order to solve a current problem, the problem is matched against the historical data in the case base, and similar cases are retrieved. The retrieved cases are analyzed and check whether they can be reused to suggest a solution for the current problem. In some scenarios, we may further modify/revise the existing solution. Finally the current problem and its solution are retained as part of a new case. Case Based Reasoning proceeds in four steps:  Retrieve the most similar cases from the past experience  Reuse the retrieved case to solve the current problem;  Revise and adapt the proposed solution  Retain the final solution for future reference

5. The Detection Model

Figure 1. Agent Hierarchy The detection model uses three levels of agents as shown in Figure 1. When a request for webpage has received from the user, URL agent extracts the URL and check whether it is malicious or not. The legitimacy of the URL is checked first by matching with URL blacklist and then by performing structural analysis. For

Volume 9 Issue 11 - 2019 1070 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

Structural analysis, the similar cases found in the case base is reused. Otherwise, a new case is generated and added to the case base. In level 2, there is a manager agent who is responsible for dispensing coordination, communication, decision- making and its evaluation. Interface agent is in Level 3 deals with the interaction of the user with the system. If a webpage is deemed as phishing, interface agent alerts the user accordingly. The attack can be detected in different tabs of a browser in parallel by deploying separate URL agents in each tab. The case based reasoning model used in the proposed system is shown in Figure 2.

Figure 2. Case Based Reasoning for URL detection

6. URL Verification The technique uses a URL blacklist to find fraudulent URLs. A URL blacklist which contains a well categorized list of malicious URLs which are deemed inappropriate for web users. When the currently visited URL differs from the earlier stored version, URL blacklist is used to check its genuine nature. The method subsequently proceeds to a URL structural analysis phase if the URL blacklist fails to show that it is fake. This module anatomize the URL according to various features and computes a total score based on the

Volume 9 Issue 11 - 2019 1071 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

existence of features in the URL. The output of URL structural analysis decides whether the URL is legitimate or malicious. The method is quite useful as legitimate users can escape from falling into fraudulent sites accidently.

6.1 Blacklisting URL

Blacklisting is a very prominent spam filtering technique. There are private blacklists (McAfee, Hotmail, Cloudmark) and public blacklist. Private blacklists are not able to be queried from outside but public blacklists can be accessed and used externally. A number of publicly available blacklists exist like Spamhaus, SURBL, URIBL, Symantec, Phishtank etc. Browsers refer blacklists of known phishing attempts to block domains and IP addresses. In the proposed method, PhishTank is used as the blacklist. The PhishTank community verifies whether the website submitted as phishing or not. However, the use of a blacklist is not always a viable means of blocking attacks as new phishing sites are hosted everyday to launch attacks and it is impossible to include all these in the blacklist. Thus, the unblocked URLs from the blacklists are undergone a structural analysis to verify its legitimacy.

6.2 Structural Analysis of URL

URL-agent checks whether the recently visited URL is blacklisted. If the URL is included in the black list, the user is advised accordingly. For the blacklist to work properly, it should ideally contain every phishing website, which is impossible. As a result, it can lead to a number of false positives. So, the webpage addresses that are not blocked by the blacklist are given a structural analysis in which 25 salient features are selected from the doubtful URL and a total score is calculated. Occurrence of each feature in URL will add one to the total score of the URL check. If the score is above a certain threshold, the page is marked as phishing. The default threshold is three detections. For structural analysis, the proposed method has used 25 features selected by observing the heuristics in the structure of phishing URLs and also by referring literature [9] [15]. The suspicious URL is decomposed into host and path, from which the bag-of-words (strings delimited by ‘/’, ‘?’, ‘.’, ‘=’, ‘-’ and ‘’) are extracted. Subsequently, the method has chosen 5 lexical features, 10 token based features and 10 target based features as shown in Table 1.

Table 1 Feature types and its count

Feature Type Count

Lexical 5 Token based 10 Target based 10

Volume 9 Issue 11 - 2019 1072 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

The proposed URL analysis module extracts various lexical features, token based and target based features to find whether the currently visited URL contains any suspicious behavior. The various steps involved in URL structural analysis is shown in Figure 3.

Figure 3. Structural Analysis of URL

6.2.1 URL Features

 Lexical Features The lexical (textual) features help us to identify that malicious URLs tend to “look different” from legal URLs. The approach has chosen 5 lexical features by analyzing the composition of phishing URLs in PhishTank.com. The lexical features include digit in host, IP address in URL, number of suspicious characters ‘@’, number of dots and slashes in path and length of the URL. By examining many phishing websites and it is noticed that several of them use digits in URL to lure the users. The digits are used in a host part or in a path of phishing URLs. If digits are present, the URL can be phishing. An IP address effectively disguises the owner of a website. To hide the domain name of the visited website, phishers sometimes use IP address in the URL instead of the domain name pretending to be an authentic domain. Sometimes the IP address is converted to its hexadecimal form as in the following link http://0x95.0xBA.0xDC.0x43/5/ebay.er/index.html. While checking the domain part, if it contain hexadecimal value or IP address then it will be considered as deceptive. Phishers generally use the suspicious character ‘@’ to redirect the user to a website different from the current domain. Usually the browser might ignore everything prior @ symbol since the real address often follows ‘@’ symbol. If a URL contains ‘@’ symbol, the URL is probable to be a phishing site. If the URL contains three or more dots and slashes, then it will be considered as a potential phish candidate. For example, http://www.ttisuccessinsightsperu.com/modules/File.documentts/Imagedrive.docu mento/Imagedrive.file/filewords/index.php is a phishing site to www.gmail.com.

Volume 9 Issue 11 - 2019 1073 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

It is found that phishers prefers to have longer URL, to hide the suspecting portion in URL. Technically, there is no fixed character length for URL. The proposed length of legitimate URLs is less than 70 characters, beyond which a URL can be phishy. According to the analysis, if the character length is equals to 56 or more, the URL is considered as fake. Table 2 shows the data types used for lexical features and Table 2 shows webpage examples with each lexical feature.

Table 2 Lexical Features and its type

Lexical Features Type

1 Presence of digits in URL Integer 2 IP address Boolean 3 Presence of suspicious character @ Boolean 4 Number of dots and slashes Integer 5 Length of URL Integer

 Token Based Features

The malicious URLs may contain some eye catching keywords or red flag tokens to attract end users. The selected 10 keywords includes login, signin, update, verify, secure, banking, webscr, dispatch, cgi and account.  Target Name Features

From PhishTank data archive, an analysis was done for different monthly stats archive and collected top 10 brands used by fraudsters during the period from January to May 2017. The most popular target was paypal. There were 16421 valid phishes against this site at a maximum. The other targets include ,Google,Microsoft,JPmorgan,Apple,Banco De Brasil, Dropbox,Yahoo and Amazon.

7. Experimental Evaluation The experiments are performed using Core i3 @2.20 GHz processor, 4GB of RAM memory, JDK 1.8 in Windows 7 platform. It uses JADE software framework to deploy agents in browser to detect URL obfuscations. The distributed multiagents communicate via FIPA ACL. The experiment is conducted in 1500 legitimate pages and 1500 phishing pages. The legitimate pages are some common webpages and phishing sites are taken from PhishTank. PhishTank [21] is the largest collaborative clearing house for data and information about phishing scams on the Internet. After submitting to PhishTank, a potential phishing URL is verified by a number of registered users to confirm it as phishing.For the evaluation, 1500 confirmed phishing URLs are collected from January 1 to May 31 of 2017. The occurrence pattern of each URL feature in phishing datasets are monitored and are plotted.

Volume 9 Issue 11 - 2019 1074 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

Figure 4. Average miss rate with respect to the number of webpages

Figure 4 shows the average miss rate with respect to the number of legitimate and phishing pages. The results show that the percentage of miss rate is reasonably low. It has been seen that the suitable selection of phishing features from malicious URLs have significant impact on the method’s performance. The URL detection method has succeeded in appropriate detection of features from phishing and legitimate sites and thus contributed to lower miss rate. The approach achieved an even better result using fewer features. Table 3 shows the percentage of existence of each feature in legitimate and phishing URLs.The selected features are also found in legitimate URLs. This table shows that the presence of phishing features in legitimate URLs contribute to some percentage of false positives.

Table 3. Percentage of existence of URL features in legitimate and phishing URLs

Features Legitimate URLs (%) Phish URLs (%) Lexical 0.4% 40.3% Token based 0.90% 62% Target based 0.70% 38.70%

Figure 4. Percentage of occurrence of lexical features

Volume 9 Issue 11 - 2019 1075 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

Figure 5. Percentage of occurrence of IP address (classwise)

Figure 4 shows the percentage of existence of lexical features in the selected set of phishing URLs. Figure 5 shows the analysis of IP address from different classes present in the set of phishing URLs. Ten suspicious tokens are selected which are frequently appearing in phishing URLs. Figure 6 shows the number of occurrences of suspicious tokens in the selected phishing URL set.

Figure 6. Number of occurrences of suspicious tokens

Volume 9 Issue 11 - 2019 1076 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

Figure 7. Percentage of phishing URLs from popular targets

Figure 7 shows the percentage of phishing URLs identified from the selected targets. Figure 8 shows the comparison analysis of the URL structural analysis module used in the proposed method with two existing URL analysis methods in terms of precision. The first one uses k-means clustering [16] to analyze the URL components and the later one uses SVM classification method [9] to classify the features. K-means clustering is not an accurate method which results in more number of false positives. Eventhough SVM classification provides accuracy, the execution of this method incurs extra overhead for the total execution process. The result shows that the proposed URL analysis method (Simple URL Check) outperforms the existing methods in terms of accuracy.

Figure 8. Comparative Analysis of URL Analysis methods

Volume 9 Issue 11 - 2019 1077 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

8. Conclusion As phishing sites are a cornerstone of internet criminal activities, there has been broad interest in developing systems to prevent the end user from visiting such sites. However, the effectiveness of such systems is questionable as the phishers change their tactics and materialize with new threats. This paper presented the design and evaluation of a heuristic-based approach to detect phishing URLs leading to identity theft and financial losses. The method has used multi agent system along with case based reasoning. As phishing sites are short-lived, blacklist need many hours to become effective to detect phishing sites. This may generate false alarms and missed detections. In order to alleviate this problem, each suspected URL is undergone a structural analysis and classifies the URLs according to twenty five extracted features. Remarkably, this method has the virtue that the adversary has very little possibility to evade detection, in comparison to other anti-phishing schemes. In the framework, there is less chance of false positives as the method has succeeded in suitable selection of URL features.The framework can be extended to work suitable for evading old as well as new phishing scams and is thus robust over time. This can be achieved by updating URL features by analyzing the pattern of recent URLs.

References

[1]Ehrenfeld, J. M. WannaCry, Cybersecurity and Health Information Technology: A Time to Act. Journal of Medical Systems, 41(7) (2017), pp. 104. [2]Miyamoto, D., Hazeyama, H., & Kadobayashi, Y. SPS: A simple filtering algorithm to thwart phishing attacks. 1st Asian Internet Engineering Conference Technologies for Advanced Heterogeneous Networks, (2005) ,pp. 195-209. [3]Ramesh, G., Krishnamurthi, I., & Kumar, K. S. S. An efficacious method for detecting phishing webpages through target domain identification. Decision Support Systems, 61 (2014), pp. 12-22. [4]Sengar, P. K., & Kumar, V. Client-side defense against phishing with pagesafe. International Journal of Computer Applications, 4(4), (2010), pp. 6-10. [5]Ma, J., Saul, L. K., Savage, S., & Voelker, G. M. Beyond blacklists: Learning to detect malicious web sites from suspicious URLs. 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, (2009), pp. 1245-1254. [6]Ma, J., Saul, L. K., Savage, S., & Voelker, G. M. Identifying suspicious URLs: an application of large- scale online learning. 26th annual international conference on machine learning, ACM, (2009), pp. 681- 688. [7]Ma, J., Saul, L. K., Savage, S., & Voelker, G. M. Learning to detect malicious URLs. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), (2011), Article 30. [8]Patil, R., Dhamdhere, B. D., Dhonde, K. S., Chinchwade, R. G., & Mehetre, S. B. A Hybrid Model to Detect Phishing-Sites using Clustering and Bayesian Approach. IJCSNS International Journal of Computer Science and Network Security, 15(1), (2015), pp. 91. [9]Huang, H., Qian, L., & Wang, Y. A SVM Based Technique to detect Phishing URLs. Information Technology Journal, 11(7), (2012), pp. 921-925. [10]Gyawali, B., Solorio, T., Montes-y-Gómez, M., Wardman, B., & Warner, G. Evaluating a semisupervised approach to phishing URL identification in a realistic scenario. 8th Annual Collab- oration, Electronic messaging, Anti-Abuse and Spam Conference, ACM, (2011), pp. 176-183. [11].Kan, M. Y., & Thi, H. O. N. Fast webpage classification using URL features. 14th ACM International Conference on Information and Knowledge Management, ACM, (2005), pp. 325-326. [12]Berger, A. L., Pietra, V. J. D., & Pietra, S. A. D. (1996, March). A maximum entropy approach to natural language processing. Computational linguistics, 22(1), (1996), pp. 39-71. [13]Nguyen, L. A. T., To, B. L., Nguyen, H. K., & Nguyen, M. H. A novel approach for phishing detection using URL-based heuristic. IEEE International Conference on Computing, Management and Telecommunications (ComManTel), (2014), pp. 298-303. [14]Kadhane, M. D., & Hambir, N. Detection of Phishing URL using Text Feature Extraction. Multidisciplinary Journal of Research in Engineering and Technology, 2(4), (2015), pp. 776-782. [15]Basnet, R. B., Sung, A. H., & Liu, Q. Learning to detect phishing URLs. International Journal of Research in Engineering and Technology, 3(6), (2014), pp. 11-24. [16]Zalik, K. R. (2008, July). An efficient k-means clustering algorithm. Pattern Recognition Letters, 29(9), (2008), pp. 1385-1391.

Volume 9 Issue 11 - 2019 1078 www.joics.org Journal of Information and Computational Science ISSN: 1548-7741

[17] Sarika, S., & Paul, V. Parallel Phishing Attack Recognition using SoftwareAgents. Journal of Intelligent & Fuzzy Systems, 32(5), (2017), pp. 3273-3284. [18]Sarika, S., & Paul, V. Agenttab: An Agent Based Approach to Detect Tabnabbing Attack. Procedia Computer Science, 46, (2015), pp. 574-581. [19]Sarika, S., & Paul, V. Distributed Software agents for antiphishing. International Journal of Computer Science Issues (IJCSI), 10(3), (2013), pp. 125-130. [20]Sarika, S., & Paul, V. Intelligent Agents in Securing Internet. Journal of Internet Technology. 19(3), (2018), pp. 753-763. [21] PhishTank - Out of the Net, into the Tank. Retrieved from PhishTank: http://www.phishtank.com,(2017). [22]Murphy, K. P. Naive Bayes Classifiers. University of British, Columbia,(2006). [23] Aamodt A, Plaza E , Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7, (1994), pp. 39–59. [24] Sycara, K. P. Multiagent systems. AI magazine, 19(2), (1998), pp. 79-79. [25] Oprea, M. Applications of multi-agent systems. In Information Technology Springer, Boston, MA. (2004). pp. 239-270.

Volume 9 Issue 11 - 2019 1079 www.joics.org