Pagerank.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Pagerank.Pdf PageRank Andrey Karnauch & Dakota Sanders Questions 1. Why is the algorithm named PageRank? 2. What algorithm is used to converge on values in PageRank? 3. Why is PageRank better than simple citation counting? Presenter - Andrey Karnauch - UTK’s Computer Science Masters Program - Advisor: Dr. Mockus - Born in Binghamton, NY - Moved to Chattanooga, TN in 2006 - Parents are from Ukraine (Soviet Union) - Like to watch other people play video games, backpack, and sometimes workout ( rip TRECS ) Andrey Karnauch Ukrainian Food Favorites From Chattanooga, TN Masters in Comp. Sci. this May Full time Django developer in Knoxville I have a dog named Boone Recently started rock climbing (I am not that good) I don’t take pictures of my food )^: I like going to the gym, playing board games Some v0 climb at Stone Fort with friends, and playing video games Bouldering in Soddy Daisy, TN (I go by Dakota and Cody, people in this class will Dakota / Cody Sanders know me as either one or the other) Boone Mountain Cur, 1.5 years old Table of Contents Overview History What is PageRank? An Example of the Power Method Converging Other Applications Implementation Experiment Open Issues References Discussion Overview - We present PageRank in its original context with web pages - Other applications of PageRank are discussed later - The following terms are used interchangeably throughout: - Pages and nodes - Links, hyperlinks, edges - The PageRank algorithm has many moving parts - We try to cover all of PageRank before showing a full-fledged example Section 1 History of PageRank Source: https://towardsdatascience.com/graphs-and-paths-pagerank-54f180a1aa0a History - The World Wide Web - As the Web grew in the 1990s, web search engines were needed to index/find Web pages - Several companies (e.g. WWWW, Altavista, WebCrawler) - 1994 (1500 queries per day) vs. 1997 (20M+ queries per day) - By 1997, only 1 of the top 4 commercial search engines “found” itself - Enter: Google research (~1995-1999) - Larry Page and Sergey Brin - Stanford grads (worth ~$50B each now) - Wanted an academic SE that emphasized search quality and scalability - At the heart of this was the PageRank algorithm - Ultimately led to the functional prototype: Google History - The World Wide Web - Search engine RankDex developed by Robin Lee in 1996 - Popularity of website based on “links” to it - Similar to PageRank - Larry Page cited Robin Lee in the PageRank patent - Robin Lee founded Baidu using RankDex in 2000 - At the core, both parties borrowed from the idea of citation analysis - Eugene Garfield in 1950s - And eigenvector centrality - Phillip Bonacich in 1986 Section 2 What is PageRank? Source: https://towardsdatascience.com/graphs-and-paths-pagerank-54f180a1aa0a What is PageRank? From Google: The basis of Google's search technology is called PageRank™, and assigns an "importance" value to each page on the web and gives it a rank to determine how useful it is. However, that's not why it's called PageRank. It's actually named after Google co-founder Larry Page. *https://web.archive.org/web/20010715123343/https://www.google.com/press/funfacts.html Setting Up - Construct a directed graph with webpages as nodes, and links as edges - A page can have any number of forward links or backlinks - Impossible to know if all A & B are backlinks of C backlinks are collected, but forward links are available by downloading the page Why use PageRank? - Intuitively, pages with many backlinks are “important” (i.e. a high citation ranking) - However, if a page has only one backlink, but that backlink is from google.com, we can also consider it to be important - PageRank handles this use case much better than simple citation ranking methods by accounting for the “importance” of each page Definition - Basic A page has high ranking if the sum of the ranks of its backlinks is high Formally: Let be a webpage with front-links and back-links. Then let , and be a factor for normalization Definition - Basic 10 A C 10 5 10 5 5 B 5 Random Surfer Interpretation - Imagine a user browsing the network clicking links at random - At each time step, the user chooses a page to visit at random - The “importance” ranking/PageRank of a page is essentially the limiting probability that the random walk will be at that node after a sufficiently large time Matrix Representation - - How do we determine initial and final ranks? - Initial ranks: Any set of ranks you want to use - Final ranks: Iterating the above computation until convergence - Requires us to convert the problem into a matrix representation - Specifically, start with a square matrix where each entry Lᵤ,ᵥ = 1/Nᵤ if there is an edge from u to v, otherwise its 0 Matrix Representation - Lᵤ,ᵥ = 1/Nᵤ if there is an edge from u to v, otherwise its 0 A B C 0 0 1 A A C ½ 0 0 B ½ 1 0 C B Dangling Nodes - What happens if a node has no forward links? A B C 0 0 1 A A C ½ 0 0 B ½ 1 0 C B D Dangling Nodes - What happens if a node has no forward links? A B C D 0 0 ½ 0 A A C ½ 0 0 0 B ½ 1 0 0 C 0 0 ½ 0 D B D Dangling Nodes - How to work with dangling nodes? - If a random surfer reaches a page with no outgoing links, they will most likely not stay on that page - Instead, randomly choose another page to continue surfing - What does this mean for our matrix representation? - The dangling node will have a column of zeros originally - Instead, replace the zeros with a 1/Nᵤ chance that page gets visited Dangling Node Correction - What does this mean for our matrix representation? A B C D 0 0 ½ 0 0 0 ½ ¼ A ½ 0 0 0 B ----------> ½ 0 0 ¼ ½ 1 0 0 C ½ 1 0 ¼ 0 0 ½ 0 D 0 0 ½ ¼ Random Surfer - Damping Factor - In our random surfer model, it is possible for a user to get stuck in a cycle of pages that link to each other - How does PageRank account for this? - Assign a damping factor (value between 0-1) - Set at 0.85, according to Page and Brin - Gives a chance to jump to a random page at any time step Matrix Representation - Lᵤ,ᵥ = 1/Nᵤ if there is an edge from u to v, otherwise its 0 A B C 0 0 1 A A C ½ 0 0 B ½ 1 0 C B Calculating PageRank - Now, to determine the PageRank of each node, 0 0 1 we need to find the eigenvector R of L with eigen ½ 0 0 value C: ½ 1 0 R = cLR - How do we find eigenvector R and eigenvalue C? A C - Use the power iteration method! - Finds an eigenvector of a square matrix corresponding to the eigenvalue with the largest magnitude B Section 3 An Example of the Power Method - Initialize directed graph of webpages A B - A page has an outgoing edge if the webpage links to the page the edge connects to C D - As an example, webpage A has links to pages B, C, and D, as shown in the graph diagram First, find the link vector for webpage A, normalizing by the number of links, 3 A B C D Then, find the link vector for webpage B A B C D Continue on for webpages C and D A B C D Convert each link vector to a column of a square matrix A B C D Notice that columns are outward links, and now rows are inward links A B C D Now set up vector to hold ranks of pages... for page A: For the entire matrix: Since we don’t have an initial rank for any pages, assume equal probabilities and normalize: Since we update when we calculate this, our notation for the entire iterative process becomes: Notice now that is an eigenvector of with eigenvalue 1 Because of how we have constructed L (a stochastic square matrix), it has properties that assure the rank vector returned from the power method is optimal. In order to account for our damping factor, , we transform our original iterative process as follows: Now, iteratively calculate until convergence! A B C D R⁰ ¼ ¼ ¼ ¼ R¹ 0.1250 0.2083 0.2083 0.4583 R² 0.1354 0.2118 0.2118 0.4410 . R¹² 0.1200 0.2400 0.2400 0.4000 R¹³ 0.1200 0.2400 0.2400 0.4000 A B C D Number of Iterations - O(log n) iterations is expected when using d = 0.85 - With a damping factor of 0.85, the original PageRank paper took ~50 iterations to converge on a graph of 322 million nodes - The larger your damping factor, the longer it takes to converge - 0.85 is a “sweet” spot Section 4 Other Applications Other Applications - The PageRank algorithm is not unique to just the web! - Some other examples include: a. Sports rankings b. Literature - finding most original authors c. Neuroscience d. Toxic waste management e. Debugging (MonitorRank) f. Predicting traffic flow (including human mobility) g. Used by Twitter to recommend people you should follow Source: https://arxiv.org/pdf/1407.5107v1.pdf Implementation - Our Experiment - Based off of sports rankings done for tennis players - To test the effectiveness of PageRank, we experimented with professional chess players a. Downloaded the Caissabase chess database and extracted ~4 million chess games - Experiment as follows: a. Create a directed graph with players as nodes b. An edge is created from the loser of a match to the winner c. By running PageRank on this graph, we can (in theory) test how influential the player with the highest PageRank is The Results - Player with highest PageRank: - Korchnoi, Viktor - “He is considered one of the strongest players never to have become World Chess Champion.” Section 5 Conclusion Open Issues - Several attempts to manipulate PageRank over the years - Spamdexing (search engine poisoning) - Creating tons of blog posts linking to your site - Buying and selling links from “important” websites - “nofollow” HTML attribute abuse - PageRank was the core of Google originally, but now it is just one of many working parts in Google’s search engine - Google hides these internals nowadays to prevent further abuse Valuable sources https://patents.google.com/patent/US7058628B1/en http://infolab.stanford.edu/pub/papers/google.pdf http://ilpubs.stanford.edu:8090/697/1/2005-33.pdf https://sci2s.ugr.es/sites/default/files/files/TematicWebSit es/hindex/PinskiNarin1976.pdf http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf http://www.ams.org/publicoutreach/feature-column/fcarc-pager ank Suggested Discussion Topics - Have any of you used PageRank for any application? If so, how did it turn out? - Can you think of other applications where PageRank could be valuable? Questions 1.
Recommended publications
  • Search Engine Optimization: a Survey of Current Best Practices
    Grand Valley State University ScholarWorks@GVSU Technical Library School of Computing and Information Systems 2013 Search Engine Optimization: A Survey of Current Best Practices Niko Solihin Grand Valley Follow this and additional works at: https://scholarworks.gvsu.edu/cistechlib ScholarWorks Citation Solihin, Niko, "Search Engine Optimization: A Survey of Current Best Practices" (2013). Technical Library. 151. https://scholarworks.gvsu.edu/cistechlib/151 This Project is brought to you for free and open access by the School of Computing and Information Systems at ScholarWorks@GVSU. It has been accepted for inclusion in Technical Library by an authorized administrator of ScholarWorks@GVSU. For more information, please contact [email protected]. Search Engine Optimization: A Survey of Current Best Practices By Niko Solihin A project submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Information Systems at Grand Valley State University April, 2013 _______________________________________________________________________________ Your Professor Date Search Engine Optimization: A Survey of Current Best Practices Niko Solihin Grand Valley State University Grand Rapids, MI, USA [email protected] ABSTRACT 2. Build and maintain an index of sites’ keywords and With the rapid growth of information on the web, search links (indexing) engines have become the starting point of most web-related 3. Present search results based on reputation and rele- tasks. In order to reach more viewers, a website must im- vance to users’ keyword combinations (searching) prove its organic ranking in search engines. This paper intro- duces the concept of search engine optimization (SEO) and The primary goal is to e↵ectively present high-quality, pre- provides an architectural overview of the predominant search cise search results while efficiently handling a potentially engine, Google.
    [Show full text]
  • Visualizing the Context of Citations Referencing Papers Published by Eugene Garfield: a New Type of Keyword Co- Occurrence Analysis
    Scientometrics https://doi.org/10.1007/s11192-017-2591-8 Visualizing the context of citations referencing papers published by Eugene Garfield: a new type of keyword co- occurrence analysis 1 2 3 Lutz Bornmann • Robin Haunschild • Sven E. Hug Received: 4 July 2017 Ó The Author(s) 2017. This article is an open access publication Abstract During Eugene Garfield’s (EG’s) lengthy career as information scientist, he published about 1500 papers. In this study, we use the impressive oeuvre of EG to introduce a new type of bibliometric networks: keyword co-occurrences networks based on the context of citations, which are referenced in a certain paper set (here: the papers published by EG). The citation context is defined by the words which are located around a specific citation. We retrieved the citation context from Microsoft Academic. To interpret and compare the results of the new network type, we generated two further networks: co- occurrence networks which are based on title and abstract keywords from (1) EG’s papers and (2) the papers citing EG’s publications. The comparison of the three networks suggests that papers of EG and citation contexts of papers citing EG are semantically more closely related to each other than to titles and abstracts of papers citing EG. This result accords with the use of citations in research evaluation that is based on the premise that citations reflect the cognitive influence of the cited on the citing publication. Keywords Bibliometrics Á Eugene Garfield Á Citation context analysis Á Co-occurrence network & Lutz Bornmann [email protected] Robin Haunschild [email protected] Sven E.
    [Show full text]
  • Of Eugene Garfield's Publications
    Scientometrics https://doi.org/10.1007/s11192-017-2608-3 Reference publication year spectroscopy (RPYS) of Eugene Garfield’s publications 1 2 3 Lutz Bornmann • Robin Haunschild • Loet Leydesdorff Received: 26 June 2017 Ó The Author(s) 2017. This article is an open access publication Abstract Which studies, theories, and ideas have influenced Eugene Garfield’s scientific work? Recently, the method reference publication year spectroscopy (RPYS) has been introduced, which can be used to answer this and related questions. Since then, several studies have been published dealing with the historical roots of research fields and sci- entists. The program CRExplorer (http://www.crexplorer.net) was specifically developed for RPYS. In this study, we use this program to investigate the historical roots of Eugene Garfield’s oeuvre. Keywords Cited references Á Reference publication year spectroscopy Á Eugene Garfield Á Historical roots Á RPYS Á Pioneer Á Bibliometrics Introduction Bibliometrics has become a central component of research evaluation. Field-normalized indicators are used to assess the scientific performance of institutions and countries. Individual researchers are well informed about their h index. The development to this & Lutz Bornmann [email protected] Robin Haunschild [email protected] Loet Leydesdorff [email protected] 1 Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstr. 8, 80539 Munich, Germany 2 Max Planck Institute for Solid State Research, Heisenbergstr. 1, 70569 Stuttgart, Germany 3 Amsterdam School of Communication Research (ASCoR), University of Amsterdam, PO Box 15793, 1001 NG Amsterdam, The Netherlands 123 Scientometrics prominence of bibliometrics had not been possible without the groundbreaking work of Eugene Garfield (EG).
    [Show full text]
  • Brand Values and the Bottom Line 1 1
    Brand Values and the Bottom Line 1 1. Why You Should Read This Guide 3 2. Common Obstacles to Avoid 4 3. Website Structure 7 4. Keyword Research 8 5. Meta Information 9 Contents 6. Body Content 11 7. Internal Site Linking 12 8. URL Equity 13 9. The Elements of URL Equity 14 10. Assessing URL Equity 15 11. The Consequences of Redesigning Without a URL Strategy 16 12. Migrating URL Equity 17 13. Summary 18 14. Appendix 1 19 15. Appendix 2 20 16. About Investis Digital 21 Brand Values and the Bottom Line 2 1. Why You Should Read This Guide Best Practices: SEO for Website Redesign & Migration outlines organic search optimization best practices for a website redesign, as well as factors to consider in order to maintain the URL equity during a domain or platform migration. This guide illustrates the common pitfalls that you can avoid in the redesign phase of a website, making it possible for a site to gain better visibility within search engines results. Additionally, Best Practices: SEO for Website Redesign & Migration explains the importance of setting up all aspects of a website, To do a deep dive including: directory structure, file names, page content and internal into SEO for website linking. Finally, we illustrate case study examples of successful site redesign and redesigns. migration, contact Reading this guide will set you up Investis Digital. for SEO success when you undergo a website redesign. The guide will We’re here to help. help you avoid costly errors and gain more traffic, leading to valuable conversions.
    [Show full text]
  • Received Citations As a Main SEO Factor of Google Scholar Results Ranking
    RECEIVED CITATIONS AS A MAIN SEO FACTOR OF GOOGLE SCHOLAR RESULTS RANKING Las citas recibidas como principal factor de posicionamiento SEO en la ordenación de resultados de Google Scholar Cristòfol Rovira, Frederic Guerrero-Solé and Lluís Codina Nota: Este artículo se puede leer en español en: http://www.elprofesionaldelainformacion.com/contenidos/2018/may/09_esp.pdf Cristòfol Rovira, associate professor at Pompeu Fabra University (UPF), teaches in the Depart- ments of Journalism and Advertising. He is director of the master’s degree in Digital Documenta- tion (UPF) and the master’s degree in Search Engines (UPF). He has a degree in Educational Scien- ces, as well as in Library and Information Science. He is an engineer in Computer Science and has a master’s degree in Free Software. He is conducting research in web positioning (SEO), usability, search engine marketing and conceptual maps with eyetracking techniques. https://orcid.org/0000-0002-6463-3216 [email protected] Frederic Guerrero-Solé has a bachelor’s in Physics from the University of Barcelona (UB) and a PhD in Public Communication obtained at Universitat Pompeu Fabra (UPF). He has been teaching at the Faculty of Communication at the UPF since 2008, where he is a lecturer in Sociology of Communi- cation. He is a member of the research group Audiovisual Communication Research Unit (Unica). https://orcid.org/0000-0001-8145-8707 [email protected] Lluís Codina is an associate professor in the Department of Communication at the School of Com- munication, Universitat Pompeu Fabra (UPF), Barcelona, Spain, where he has taught information science courses in the areas of Journalism and Media Studies for more than 25 years.
    [Show full text]
  • Analysis of the Youtube Channel Recommendation Network
    CS 224W Project Milestone Analysis of the YouTube Channel Recommendation Network Ian Torres [itorres] Jacob Conrad Trinidad [j3nidad] December 8th, 2015 I. Introduction With over a billion users, YouTube is one of the largest online communities on the world wide web. For a user to upload a video on YouTube, they can create a channel. These channels serve as the home page for that account, displaying the account's name, description, and public videos that have been up- loaded to YouTube. In addition to this content, channels can recommend other channels. This can be done in two ways: the user can choose to feature a channel or YouTube can recommend a channel whose content is similar to the current channel. YouTube features both of these types of recommendations in separate sidebars on the user's channel. We are interested analyzing in the structure of this potential network. We have crawled the YouTube site and obtained a dataset totaling 228575 distinct user channels, 400249 user recommendations, and 400249 YouTube recommendations. In this paper, we present a systematic and in-depth analysis on the structure of this network. With this data, we have created detailed visualizations, analyzed different centrality measures on the network, compared their community structures, and performed motif analysis. II. Literature Review As YouTube has been rising in popularity since its creation in 2005, there has been research on the topic of YouTube and discovering the structure behind its network. Thus, there exists much research analyzing YouTube as a social network. Cheng looks at videos as nodes and recommendations to other videos as links [1].
    [Show full text]
  • Package 'Rahrefs'
    Package ‘RAhrefs’ July 28, 2019 Type Package Title 'Ahrefs' API R Interface Version 0.1.4 Description Enables downloading detailed reports from <https://ahrefs.com> about backlinks from pointing to website, provides authentication with an API key as well as ordering, grouping and filtering functionalities. License MIT + file LICENCE URL https://ahrefs.com/ BugReports https://github.com/Leszek-Sieminski/RAhrefs/issues Depends R (>= 3.4.0) Imports assertthat, httr, jsonlite, testthat NeedsCompilation no Encoding UTF-8 LazyData true RoxygenNote 6.1.1 Author Leszek Sieminski´ [aut, cre], Performance Media Polska sp. z o.o. [cph, fnd] Maintainer Leszek Sieminski´ <[email protected]> Repository CRAN Date/Publication 2019-07-28 08:40:02 UTC R topics documented: ahrefs_metrics . 2 ahrefs_reports . 2 rah_ahrefs_rank . 3 rah_anchors . 5 rah_anchors_refdomains . 8 rah_auth . 11 rah_backlinks . 11 1 2 ahrefs_metrics rah_backlinks_new_lost . 14 rah_backlinks_new_lost_counters . 17 rah_backlinks_one_per_domain . 20 rah_broken_backlinks . 23 rah_broken_links . 26 rah_condition . 29 rah_condition_set . 31 rah_domain_rating . 32 rah_downloader . 34 rah_linked_anchors . 36 rah_linked_domains . 39 rah_linked_domains_by_type . 42 rah_metrics . 45 rah_metrics_extended . 47 rah_pages . 50 rah_pages_extended . 52 rah_pages_info . 55 rah_refdomains . 58 rah_refdomains_by_type . 61 rah_refdomains_new_lost . 64 rah_refdomains_new_lost_counters . 66 rah_refips . 68 rah_subscription_info . 71 ahrefs_metrics Ahrefs metrics Description Description of Ahrefs
    [Show full text]
  • Seo On-Page Optimization
    SEO ON-PAGE OPTIMIZATION Equipping Your Website to Become a Powerful Marketing Platform Copyright 2016 SEO GLOBAL MEDIA TABLE OF CONTENTS Introduction 01 Chapter I Codes, Tags, and Metadata 02 Chapter II Landing Pages and Buyer Psychology 06 Chapter III Conclusion 08 Copyright 2016 SEO GLOBAL MEDIA INTRODUCTION Being indexed and ranked on the search engine results pages (SERPs) depends on many factors, beginning with the different elements on each of your website. Optimizing these factors helps search engine crawlers find your website, index the pages appropriately , and rank it according to your desired keywords. On-page optimization plays a big role in ensuring your online marketing campaign's success. In this definitive guide, you will learn how to successfully optimize your website to ensure that your website is indexed and ranked on the SERPs. In addition you’ll learn how to make your web pages convert. Copyright 2016 SEO GLOBAL MEDIA SEO On-Page Optimization | 1 CODES, MARK-UPS, AND METADATA Let’s get technical Having clean codes, optimized HTML tags and metadata helps search engines crawl your site better and index and rank your pages according to the relevant search terms. Make sure to check the following: Source Codes Your source code is the backbone of your website. The crawlers finds everything it needs in order to index your website here. Make sure your source code is devoid of any problems by checking the following: INCORRECTLY IMPLEMENTED TAGS: Examples of these are re=canonical tags, authorship mark-up, or redirects. These could prove catastrophic, especially the canonical code, which can end up in duplicate content penalties.
    [Show full text]
  • Webpage Ranking Algorithms Second Exam Report
    Webpage Ranking Algorithms Second Exam Report Grace Zhao Department of Computer Science Graduate Center, CUNY Exam Committee Professor Xiaowen Zhang, Mentor, College of Staten Island Professor Ted Brown, Queens College Professor Xiangdong Li, New York City College of Technology Initial version: March 8, 2015 Revision: May 1, 2015 1 Abstract The traditional link analysis algorithms exploit the context in- formation inherent in the hyperlink structure of the Web, with the premise being that a link from page A to page B denotes an endorse- ment of the quality of B. The exemplary PageRank algorithm weighs backlinks with a random surfer model; Kleinberg's HITS algorithm promotes the use of hubs and authorities over a base set; Lempel and Moran traverse this structure through their bipartite stochastic algo- rithm; Li examines the structure from head to tail, counting ballots over hypertext. Semantic Web and its inspired technologies bring new core factors into the ranking equation. While making continuous effort to improve the importance and relevancy of search results, Semantic ranking algorithms strive to improve the quality of search results: (1) The meaning of the search query; and (2) The relevancy of the result in relation to user's intention. The survey focuses on an overview of eight selected search ranking algorithms. 2 Contents 1 Introduction 4 2 Background 5 2.1 Core Concepts . 5 2.1.1 Search Engine . 5 2.1.2 Hyperlink Structure . 5 2.1.3 Search Query . 7 2.1.4 Web Graph . 7 2.1.5 Base Set of Webpages . 9 2.1.6 Semantic Web .
    [Show full text]
  • Human Computation Luis Von Ahn
    Human Computation Luis von Ahn CMU-CS-05-193 December 7, 2005 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Manuel Blum, Chair Takeo Kanade Michael Reiter Josh Benaloh, Microsoft Research Jitendra Malik, University of California, Berkeley Copyright © 2005 by Luis von Ahn This work was partially supported by the National Science Foundation (NSF) grants CCR-0122581 and CCR-0085982 (The Aladdin Center), by a Microsoft Research Fellowship, and by generous gifts from Google, Inc. The views and conclusions contained in this document are those of the author and should not be interpreted as representing official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: CAPTCHA, the ESP Game, Peekaboom, Verbosity, Phetch, human computation, automated Turing tests, games with a purpose. 2 Abstract Tasks like image recognition are trivial for humans, but continue to challenge even the most sophisticated computer programs. This thesis introduces a paradigm for utilizing human processing power to solve problems that computers cannot yet solve. Traditional approaches to solving such problems focus on improving software. I advocate a novel approach: constructively channel human brainpower using computer games. For example, the ESP Game, introduced in this thesis, is an enjoyable online game — many people play over 40 hours a week — and when people play, they help label images on the Web with descriptive keywords. These keywords can be used to significantly improve the accuracy of image search. People play the game not because they want to help, but because they enjoy it. I introduce three other examples of “games with a purpose”: Peekaboom, which helps determine the location of objects in images, Phetch, which collects paragraph descriptions of arbitrary images to help accessibility of the Web, and Verbosity, which collects “common-sense” knowledge.
    [Show full text]
  • Pagerank Best Practices Neus Ferré Aug08
    Search Engine Optimisation. PageRank best Practices Graduand: Supervisor: Neus Ferré Viñes Florian Heinemann [email protected] [email protected] UPC Barcelona RWTH Aachen RWTH Aachen Business Sciences for Engineers Business Sciences for Engineers and Natural Scientists and Natural Scientists July 2008 Abstract Since the explosion of the Internet age the need of search online information has grown as well at the light velocity. As a consequent, new marketing disciplines arise in the digital world. This thesis describes, in the search engine marketing framework, how the ranking in the search engine results page (SERP) can be influenced. Wikipedia describes search engine marketing or SEM as a form of Internet marketing that seeks to promote websites by increasing their visibility in search engine result pages (SERPs). Therefore, the importance of being searchable and visible to the users reveal needs of improvement for the website designers. Different factors are used to produce search rankings. One of them is PageRank. The present thesis focuses on how PageRank of Google makes use of the linking structure of the Web in order to maximise relevance of the results in a web search. PageRank used to be the jigsaw of webmasters because of the secrecy it used to have. The formula that lies behind PageRank enabled the founders of Google to convert a PhD into one of the most successful companies ever. The uniqueness of PageRank in contrast to other Web Search Engines consist in providing the user with the greatest relevance of the results for a specific query, thus providing the most satisfactory user experience.
    [Show full text]
  • On July 15, 1955, Dr. Eugene Garfield Revolutionized Research with His Concept of Citation Indexing and Searching
    Sunday, July 12, 2009 • CHICAGO ON JULY 15, 1955, DR. EUGENE GARFIELD REVOLUTIONIZED RESEARCH WITH HIS CONCEPT OF CITATION INDEXING AND SEARCHING. With the publication of his ground-breaking paper “Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas,” Dr. Garfi eld laid the foundation for what is now, Web of Science®. To this day, only Web of Science® offers a true cited reference index which is still the best tool for discovery and the only method of retrieving accurate citation counts. FOR REAL KNOWLEDGE : VISIT : BOOTH #3616 isiwebofknowledge.com/realfacts REAL FACTS : REAL NUMBERS : REAL KNOWLEDGE Page 16 • Cognotes Sunday, July 12, 2009 • CHICAGO Patricia Martin to Speak Celebrate the Coretta Scott King Awards Anniversary at LLAMA President’s ALA’s Coretta Scott King Book imprint of Boyds Mills Press, Inc. partners who work with youth. Program Awards Committee invites you to Three Illustrator Honor Books were • Creative opportunities for commu- celebrate their 40th anniversary. Be- also selected: We Are the Ship: The Story nity involvement: Intergenerational Author and consultant Patricia ginning with a special program “Lift of Negro League Baseball, written and il- book discussions, community-wide Martin will speak at the LLAMA Every Voice and Read: Everything you lustrated by Kadir Nelson and published reading programs, and more. President’s Program today, 1:30 – wanted to know about the Coretta Scott by Disney-Jump at the Sun, an imprint Registration for this special event is 3:00 pm at the Fairmont Hotel In- King Book Award titles, and more,” and of Disney Book Group; The Moon Over $100 and includes a copy of The Coretta ternational Ballroom.
    [Show full text]