E-Business's Page Ranking with Ant Colony Ranking Algorithm

E-Business's Page Ranking with Ant Colony Ranking Algorithm

E-Business’s Page Ranking with Ant Colony Algorithm Asst. Prof. Chonawat Srisa-an, Ph.D. Faculty of Information Technology, Rangsit University 52/347 Phaholyothin Rd. Lakok Pathumthani, 12000 [email protected], [email protected] Abstract Keywords: web mining, text mining, information retrieval, search engines, Expert Almost all E-Business transactions use Systems and AI in e-Business classical keyword-based methods for searching commercial goods. One of the main problems that plague modern search 1. Introduction engines is poor relevance. In many situations, search engines retrieve thousands of pages In many situations, search engines at least partially satisfying input query. Of retrieve thousands of pages at least partially course, human attention remains more or satisfying input query. In E-business world, less constant, and most humans are able to people have been overwhelmed with those cope only with about 100-200 retrieved non-relevant pages every day. PageRank documents. succeeds in such situations by separating PageRank index method has been highly respected sites from junk pages that successfully used for search engine output only happened to contain words from the sorting. Two such experimental search query. PageRank is therefore a method that engines have been constructed at Stanford would allow us to approximate page University [1]. There seems to be quite high importance for the user, regardless of its correlation between high PageRank index, relevance to the query measured by classical and general page importance judged by methods. Having such approximation -in fact human users. This is especially spectacular a ranking score -we can either sort the pages, for very general queries, for which a lot of presenting these with higher rank first, or relevant pages exist in the Internet. even entirely remove these which have very This work proposes an algorithm for poor rank thus allowing user to concentrate page rank called Ant Colony Ranking (ACR) on a set of pages which has reasonable algorithm. The goal of this paper is to quantity. investigate the Ant Colony Ranking (ACR) There are two drawbacks of this algorithm in the context of such self- classical method. First, one of the most modifying systems to solve two drawbacks of important drawbacks of this methodology is this method. The experiments show that the the necessity to have access to (ideally) algorithm has good convergence properties entire Web structure to perform proper over hypertext structure of the Internet. computation. This is possible only in large Proceedings of the Fourth International Conference on eBusiness, November 19-20, 2005, Bangkok, Thailand 27.1 systems that maintain entire Web hyperlink 1.PR(Tn) - Each page has a notion of its databases [4] such as general-purpose search own self-importance. That’s “PR(T1)” engines and web crawlers. The first one for the first page in the web all the way emphasizes the research effort on Web up to “PR(Tn)” for the last page topology and structural dependencies 2.C(Tn) - Each page spreads its vote out between links and documents, without evenly amongst all of it’s outgoing examining contents of hypertext documents. links. The count, or number, of It proves that the structure of Web itself outgoing links for page 1 is “C(T1)”, carries lots of useful information that should “C(Tn)” for page n, and so on for all be retrieved. pages. Secondly, the other drawback of this 3.PR(Tn)/C(Tn) - so if our page (page A) classical methodology is the difficulty to find has a backlink from page “n” the share out how many times we need to repeat the of the vote page A will get is calculation for big networks. That’s a “PR(Tn)/C(Tn)” difficult question; for a network as large as 4.d(... - All these fractions of votes are the World Wide Web [2] it can be many added together but, to stop the other millions of iterations! The “damping factor” pages having too much influence, this is quite subtle. If it’s too high then it takes total vote is “damped down” by ages for the numbers to settle, if it’s too low multiplying it by 0.85 (the factor “d”) then you get repeated over-shoot, both above 5.(1 - d) - The (1 – d) bit at the beginning and below the average - the numbers just is a bit of probability math magic so the swing about the average like a pendulum and “sum of all web pages’ PageRanks will never settle down. Also choosing the order be one”: it adds in the bit lost by the of calculations can help. The answer will d(.... It also means that if a page has no always come out the same no matter which links to it (no backlinks) even then it order you choose, but some orders will get will still get a small PR of 0.15 (i.e. 1 – you there quicker than others. 0.85). The goal of this paper is to Note that the PageRanks form a investigating the Ant Colony Ranking (ACR) probability distribution over web pages, so algorithm in the context of such self- the sum of all web pages’ PageRanks will be modifying systems to solve two drawbacks one. of a classical ranking method. The ACR PageRank or PR(A) can be calculated software is to construct a web structure and using a simple iterative algorithm, and compute ranking score for each page. corresponds to the principal eigenvector of the normalized link matrix of the web. 2. Standard Pagerank Formula 3. Ant Colony Ranking Algorithm We assume page A has pages T1...Tn which point to it (i.e., are citations). The Ant Colony Ranking (ACR) algorithm parameter d is a damping factor which can is a system based on agents which simulate be set between 0 and 1. Also C (A) is defined the natural behavior of ants, including as the number of links going out of page A. mechanisms of cooperation and adaptation. The PageRank of a page A is given as ACR algorithms are based on the follows: following RULES: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... +PR(Tn)/C(Tn)) Special Issue of the International Journal of the Computer, the Internet and Management, Vol.13 No.SP3, November, 2005 27.2 • Each path followed by an ant is route. Assume that there is only associated with a candidate solution available food for one ant; therefore, for a given problem. once an ant reaches the destination • When an ant follows a path, the (food), it eliminates its pheromone on amount of pheromone deposited on the way back to its nest to prevent that path is proportional to the quality loop. of the corresponding candidate • Artificial ants implement loop solution for the target problem. elimination by eliminate all Artificial ants have a probabilistic pheromone and deposit backward preference for paths with a larger pheromone to set up a path. The amount of pheromone. Two kinds of permanent path is created by pheromones are forward pheromone “backward pheromone”. and backward pheromone on • Each ant that following a forward different purposes. pheromone has to give up its search • Each destination has food available once it finds backward pheromone on for only one artificial ant; therefore, the path. each node ant needs to visit only once. • In ACR system, the ants memorize When an ant has to choose between the node they visit during the forward two or more paths, the path(s) with a path and exchange its journey larger amount of pheromone have a knowledge with other once it reaches greater probability of being chosen its nest. The permanent path was by the ant. created by backward pheromone and • An appropriate representation of the no longer use. At this point in time, problem, which allows the ants to information about incoming and incrementally construct/modify outgoing for each node is store in a solutions through the use of a database. probabilistic transition rule, based on the amount of pheromone in the trail. Note: For clustering purpose, there are • Artificial Ants can be thought of possibly many nests in the ACR system. The having two working forward and ACR software reduces size problems by backward modes. They are in clustering methods that decompose such forward mode when they are moving problems into smaller ones. from the nest toward the food, and they are in backward mode when 4. Ant Colony Ranking Software they are moving from the food backward mode when they are In order to solve Web structure moving from the food back to their problem, this research replaces web crawler nest. Once an artificial ant in forward by Artificial Ants for the path construction mode reaches its destination, it phase and use web log analysis for better switches to backward mode and starts understanding of reference page. Since some its travel back to the source on the reference path is hard to detect in WWW same route they previously used and environment since any page can point to the switch to its backward pheromone. page anytime, web log is used for a better • A rule for pheromone updating, result but not the main part of the system. which specifies how to modify the ACR software creates enough pheromone trail (t). Artificial Ants “Artificial Ants” to explore Web structure. use pheromone trail to guide the All artificial ants must return their nest once Proceedings of the Fourth International Conference on eBusiness, November 19-20, 2005, Bangkok, Thailand 27.3 they reach a food (destination). Each ant backward pheromone and was no longer detects <a href=“...”> and </a> tags in source used.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us