Improving Website Hyperlink Structure Using Server Logs Ashwin Paranjape∗ Robert West∗ Leila Zia Jure Leskovec Stanford University Stanford University Wikimedia Foundation Stanford University
[email protected] [email protected] [email protected] [email protected] ABSTRACT quently [7, 21]. For example, about 7,000 new pages are created Good websites should be easy to navigate via hyperlinks, yet main- on Wikipedia every day [35]. As a result, there is an urgent need taining a high-quality link structure is difficult. Identifying pairs to keep the site’s link structure up to date, which in return requires of pages that should be linked may be hard for human editors, es- much manual effort and does not scale as the website grows, even pecially if the site is large and changes frequently. Further, given with thousands of volunteer editors as in the case of Wikipedia. a set of useful link candidates, the task of incorporating them into Neither is the problem limited to Wikipedia; on the contrary, it may the site can be expensive, since it typically involves humans editing be even more pressing on other sites, which are often maintained pages. In the light of these challenges, it is desirable to develop by a single webmaster. Thus, it is important to provide automatic data-driven methods for automating the link placement task. Here methods to help maintain and improve website hyperlink structure. we develop an approach for automatically finding useful hyperlinks Considering the importance of links, it is not surprising that the to add to a website. We show that passively collected server logs, problems of finding missing links and predicting which links will beyond telling us which existing links are useful, also contain im- form in the future have been studied previously [17, 19, 22, 32].