The Pagerank Citation Ranking: Bringing Order to the Web Paper Review
Total Page:16
File Type:pdf, Size:1020Kb
The PageRank Citation Ranking: Bringing Order to the Web Paper Review Anand Singh Kunwar 1 Introduction The PageRank [1] algorithm decribes a way to compute ranks or ratings of web pages. This comes from a simple intuition. Consider the web to be a directed graph with each node being a web page. Each web page has certain links to it and certain links from it. Let these links be edges of our directed graph. Something like what we can see in the following graphical representation of 3 pages. a c b The pages a, b and c have some outgoing and incoming edges. We compute the total PageRank of these pages by iterating over the following algorithm. X Rank(v) Rank(u) = c + cE(u) Nv v2Lu Here Nv are number of pages being linked by v, Lu is the set of pages which have links to u, E(u) is some vector that correspond to the source of rank and c is the normalising constant. The reason why the E(u) was introduced was for cases when a loop has no out-edges. In this case, our ranks of nodes inside this loop would blow up. The paper also provides us with a real life intuition for this. Authors provide us with a random surfer model. This model states that a web surfer will click consecutive pages like a random walk on graphs. However, in case this surfer encounters a loop of webpages with no other outgoing webpage than the ones in the loop, it is highly unlikely that this surfer will continue. He/She may jump to any random page. To inculcate this behaviour of the web surfer, the factor of random jumping to other webpage is included. This factor is our E(u). 1 2 Example This is a demonstration showing the working of our PageRank Algorithm, with 3 pages a, b and c with edges as links. a c b We assume, initially all pages have rank 1. The paper states the value of cE(u) to be (1 − d) where d = 0:85 X Rank(v) Rank(u) = d + (1 − d) Nv v2Lu After first iteration Rank(a) = 1:000 Rank(b) = 0:575 Rank(c) = 1:425 After second iteration Rank(a) = 1:361 Rank(b) = 0:575 Rank(c) = 1:064 After about 20 iterations, it roughly converges to Rank(a) = 1:163 Rank(b) = 0:644 Rank(c) = 1:192 3 Conclusion One of the most important aspects of this algorithm is the damping factor d. The value of d is taken to be 0:85, for the reason of a heuristic that one in every six page visit will be a jump and will not follow a link. The PageRank algorithm has it's application not only in webpages, but such a system can also be implemented in Research Paper citations. Another important aspect of this paper is that it not only considers the quantity of backlinks or incoming links, but also their quality (higher PageRank implies higher quality). Thus, say your personal blog has some backlinks from high ranked websites, compared to your friend's personal blog which has many backlinks from other low ranked personal blogs, your computed PageRank could be higher than your friend's. That is the beauty of the PageRank algorithm. 2 References [1] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999. Previous number = SIDL- WP-1999-0120. 3.