Google's Pagerank and Beyond: the Science of Search Engine Rankings

Total Page:16

File Type:pdf, Size:1020Kb

Google's Pagerank and Beyond: the Science of Search Engine Rankings Google’s PageRank and Beyond: The Science of Search Engine Rankings This page intentionally left blank Google’s PageRank and Beyond: The Science of Search Engine Rankings Amy N. Langville and Carl D. Meyer PRINCETON UNIVERSITY PRESS PRINCETON AND OXFORD This page intentionally left blank Contents Preface ix Chapter 1. Introduction to Web Search Engines 1 1.1 A Short History of Information Retrieval 1 1.2 An Overview of Traditional Information Retrieval 5 1.3 Web Information Retrieval 9 Chapter 2. Crawling, Indexing, and Query Processing 15 2.1 Crawling 15 2.2 The Content Index 19 2.3 Query Processing 21 Chapter 3. Ranking Webpages by Popularity 25 3.1 The Scene in 1998 25 3.2 Two Theses 26 3.3 Query-Independence 30 Chapter 4. The Mathematics of Google’s PageRank 31 4.1 The Original Summation Formula for PageRank 32 4.2 Matrix Representation of the Summation Equations 33 4.3 Problems with the Iterative Process 34 4.4 A Little Markov Chain Theory 36 4.5 Early Adjustments to the Basic Model 36 4.6 Computation of the PageRank Vector 39 4.7 Theorem and Proof for Spectrum of the Google Matrix 45 Chapter 5. Parameters in the PageRank Model 47 5.1 The α Factor 47 5.2 The Hyperlink Matrix H 48 5.3 The Teleportation Matrix E 49 Chapter 6. The Sensitivity of PageRank 57 6.1 Sensitivity with respect to α 57 vi CONTENTS 6.2 Sensitivity with respect to H 62 T 6.3 Sensitivity with respect to v 63 6.4 Other Analyses of Sensitivity 63 6.5 Sensitivity Theorems and Proofs 66 Chapter 7. The PageRank Problem as a Linear System 71 7.1 Properties of (I − αS) 71 7.2 Properties of (I − αH) 72 7.3 Proof of the PageRank Sparse Linear System 73 Chapter 8. Issues in Large-Scale Implementation of PageRank 75 8.1 Storage Issues 75 8.2 Convergence Criterion 79 8.3 Accuracy 79 8.4 Dangling Nodes 80 8.5 Back Button Modeling 84 Chapter 9. Accelerating the Computation of PageRank 89 9.1 An Adaptive Power Method 89 9.2 Extrapolation 90 9.3 Aggregation 94 9.4 Other Numerical Methods 97 Chapter 10. Updating the PageRank Vector 99 10.1 The Two Updating Problems and their History 100 10.2 Restarting the Power Method 101 10.3 Approximate Updating Using Approximate Aggregation 102 10.4 Exact Aggregation 104 10.5 Exact vs. Approximate Aggregation 105 10.6 Updating with Iterative Aggregation 107 10.7 Determining the Partition 109 10.8 Conclusions 111 Chapter 11. The HITS Method for Ranking Webpages 115 11.1 The HITS Algorithm 115 11.2 HITS Implementation 117 11.3 HITS Convergence 119 11.4 HITS Example 120 11.5 Strengths and Weaknesses of HITS 122 11.6 HITS’s Relationship to Bibliometrics 123 11.7 Query-Independent HITS 124 11.8 Accelerating HITS 126 11.9 HITS Sensitivity 126 CONTENTS vii Chapter 12. Other Link Methods for Ranking Webpages 131 12.1 SALSA 131 12.2 Hybrid Ranking Methods 135 12.3 Rankings based on Traffic Flow 136 Chapter 13. The Future of Web Information Retrieval 139 13.1 Spam 139 13.2 Personalization 142 13.3 Clustering 142 13.4 Intelligent Agents 143 13.5 Trends and Time-Sensitive Search 144 13.6 Privacy and Censorship 146 13.7 Library Classification Schemes 147 13.8 Data Fusion 148 Chapter 14. Resources for Web Information Retrieval 149 14.1 Resources for Getting Started 149 14.2 Resources for Serious Study 150 Chapter 15. The Mathematics Guide 153 15.1 Linear Algebra 153 15.2 Perron–Frobenius Theory 167 15.3 Markov Chains 175 15.4 Perron Complementation 186 15.5 Stochastic Complementation 192 15.6 Censoring 194 15.7 Aggregation 195 15.8 Disaggregation 198 Chapter 16. Glossary 201 Bibliography 207 Index 219 This page intentionally left blank Preface Purpose As teachers of linear algebra, we wanted to write a book to help students and the general public appreciate and understand one of the most exciting applications of linear algebra today—the use of link analysis by web search engines. This topic is inherently interesting, timely, and familiar. For instance, the book answers such curious questions as: How do search engines work? Why is Google so good? What’s a Google bomb? How can I improve the ranking of my homepage in Teoma? We also wanted this book to be a single source for material on web search engine rank- ings. A great deal has been written on this topic, but it’s currently spread across numerous technical reports, preprints, conference proceedings, articles, and talks. Here we have summarized, clarified, condensed, and categorized the state of the art in web ranking. Our Audience We wrote this book with two diverse audiences in mind: the general science reader and the technical science reader. The title echoes the technical content of the book, but in addition to being informative on a technical level, we have also tried to provide some entertaining features and lighter material concerning search engines and how they work. The Mathematics Our goal in writing this book was to reach a challenging audience consisting of the general scientific public as well as the technical scientific public. Of course, a complete understanding of link analysis requires an acquaintance with many mathematical ideas. Nevertheless, we have tried to make the majority of the book accessible to the general sci- entific public. For instance, each chapter builds progressively in mathematical knowledge, technicality, and prerequisites. As a result, Chapters 1-4, which introduce web search and link analysis, are aimed at the general science reader. Chapters 6, 9, and 10 are particularly mathematical. The last chapter, Chapter 15, “The Mathematics Guide,” is a condensed but complete reference for every mathematical concept used in the earlier chapters. Through- out the book, key mathematical concepts are highlighted in shaded boxes. By postponing the mathematical definitions and formulas until Chapter 15 (rather than interspersing them throughout the text), we were able to create a book that our mathematically sophisticated readers will also enjoy. We feel this approach is a compromise that allows us to serve both audiences: the general and technical scientific public. x PREFACE Asides An enjoyable feature of this book is the use of Asides. Asides contain entertaining news stories, practical search tips, amusing quotes, and racy lawsuits. Every chapter, even the particularly technical ones, contains several asides. Often times a light aside provides the perfect break after a stretch of serious mathematical thinking. Brief asides appear in shaded boxes while longer asides that stretch across multiple pages are offset by horizontal bars and italicized font. We hope you enjoy these breaks—we found ourselves looking forward to writing them. Computing and Code Truly mastering a subject requires experimenting with the ideas. Consequently, we have incorporated Matlab code to encourage and jump-start the experimentation process. While any programming language is appropriate, we chose Matlab for three reasons: (1) its matrix storage architecture and built-in commands are particularly suited to the large sparse link analysis matrices of this text, (2) among colleges and universities, Matlab is a market leader in mathematical software, and (3) it’s very user-friendly. The Matlab programs in this book are intended to be instruction, not production, code. We hope that, by playing with these programs, readers will be inspired to create new models and algorithms. Acknowledgments We thank Princeton University Press for supporting this book. We especially enjoyed working with Vickie Kearn, the Senior Editor at PUP. Vickie, thank you for displaying just the right combination of patience and gentle pressure. For a book with such timely mate- rial, you showed amazing faith in us. We thank all those who reviewed our manuscripts and made this a better book. Of course, we also thank our families and friends for their encouragement. Your pride in us is a powerful driving force. Dedication We dedicate this book to mentors and mentees worldwide. The energy, inspiration, and support that is sparked through such relationships can inspire great products. For us, it produced this book, but more importantly, a wonderful synergistic friendship. Chapter One Introduction to Web Search Engines 1.1 A SHORT HISTORY OF INFORMATION RETRIEVAL Today we have museums for everything—the museum of baseball, of baseball players, of crazed fans of baseball players, museums for world wars, national battles, legal fights, and family feuds. While there’s no shortage of museums, we have yet to find a museum ded- icated to this book’s field, a museum of information retrieval and its history. Of course, there are related museums, such as the Library Museum in Boras, Sweden, but none con- centrating on information retrieval. Information retrieval1 is the process of searching within a document collection for a particular information need (called a query). Although dominated by recent events following the invention of the computer, information retrieval actually has a long and glorious tradition. To honor that tradition, we propose the cre- ation of a museum dedicated to its history. Like all museums, our museum of information retrieval contains some very interesting artifacts. Join us for a brief tour. The earliest document collections were recorded on the painted walls of caves. A cave dweller interested in searching a collection of cave paintings to answer a particular information query had to travel by foot, and stand, staring in front of each painting. Un- fortunately, it’s hard to collect an artifact without being gruesome, so let’s fast forward a bit. Before the invention of paper, ancient Romans and Greeks recorded information on papyrus rolls.
Recommended publications
  • Identificação De Textos Em Imagens CAPTCHA Utilizando Conceitos De
    Identificação de Textos em Imagens CAPTCHA utilizando conceitos de Aprendizado de Máquina e Redes Neurais Convolucionais Relatório submetido à Universidade Federal de Santa Catarina como requisito para a aprovação da disciplina: DAS 5511: Projeto de Fim de Curso Murilo Rodegheri Mendes dos Santos Florianópolis, Julho de 2018 Identificação de Textos em Imagens CAPTCHA utilizando conceitos de Aprendizado de Máquina e Redes Neurais Convolucionais Murilo Rodegheri Mendes dos Santos Esta monografia foi julgada no contexto da disciplina DAS 5511: Projeto de Fim de Curso e aprovada na sua forma final pelo Curso de Engenharia de Controle e Automação Prof. Marcelo Ricardo Stemmer Banca Examinadora: André Carvalho Bittencourt Orientador na Empresa Prof. Marcelo Ricardo Stemmer Orientador no Curso Prof. Ricardo José Rabelo Responsável pela disciplina Flávio Gabriel Oliveira Barbosa, Avaliador Guilherme Espindola Winck, Debatedor Ricardo Carvalho Frantz do Amaral, Debatedor Agradecimentos Agradeço à minha mãe Terezinha Rodegheri, ao meu pai Orlisses Mendes dos Santos e ao meu irmão Camilo Rodegheri Mendes dos Santos que sempre estiveram ao meu lado, tanto nos momentos de alegria quanto nos momentos de dificuldades, sempre me deram apoio, conselhos, suporte e nunca duvidaram da minha capacidade de alcançar meus objetivos. Agradeço aos meus colegas Guilherme Cornelli, Leonardo Quaini, Matheus Ambrosi, Matheus Zardo, Roger Perin e Victor Petrassi por me acompanharem em toda a graduação, seja nas disciplinas, nos projetos, nas noites de estudo, nas atividades extracurriculares, nas festas, entre outros desafios enfrentados para chegar até aqui. Agradeço aos meus amigos de infância Cássio Schmidt, Daniel Lock, Gabriel Streit, Gabriel Cervo, Guilherme Trevisan, Lucas Nyland por proporcionarem momentos de alegria mesmo a distância na maior parte da caminhada da graduação.
    [Show full text]
  • Hacking the Master Switch? the Role of Infrastructure in Google's
    Hacking the Master Switch? The Role of Infrastructure in Google’s Network Neutrality Strategy in the 2000s by John Harris Stevenson A thesis submitteD in conformity with the requirements for the Degree of Doctor of Philosophy Faculty of Information University of Toronto © Copyright by John Harris Stevenson 2017 Hacking the Master Switch? The Role of Infrastructure in Google’s Network Neutrality Strategy in the 2000s John Harris Stevenson Doctor of Philosophy Faculty of Information University of Toronto 2017 Abstract During most of the decade of the 2000s, global Internet company Google Inc. was one of the most prominent public champions of the notion of network neutrality, the network design principle conceived by Tim Wu that all Internet traffic should be treated equally by network operators. However, in 2010, following a series of joint policy statements on network neutrality with telecommunications giant Verizon, Google fell nearly silent on the issue, despite Wu arguing that a neutral Internet was vital to Google’s survival. During this period, Google engaged in a massive expansion of its services and technical infrastructure. My research examines the influence of Google’s systems and service offerings on the company’s approach to network neutrality policy making. Drawing on documentary evidence and network analysis data, I identify Google’s global proprietary networks and server locations worldwide, including over 1500 Google edge caching servers located at Internet service providers. ii I argue that the affordances provided by its systems allowed Google to mitigate potential retail and transit ISP gatekeeping. Drawing on the work of Latour and Callon in Actor– network theory, I posit the existence of at least one actor-network formed among Google and ISPs, centred on an interest in the utility of Google’s edge caching servers and the success of the Android operating system.
    [Show full text]
  • Vikas Sindhwani Google May 17-19, 2016 2016 Summer School on Signal Processing and Machine Learning for Big Data
    Real-time Learning and Inference on Emerging Mobile Systems Vikas Sindhwani Google May 17-19, 2016 2016 Summer School on Signal Processing and Machine Learning for Big Data Abstract: We are motivated by the challenge of enabling real-time "always-on" machine learning applications on emerging mobile platforms such as next-generation smartphones, wearable computers and consumer robotics systems. On-device models in such settings need to be highly compact, and need to support fast, low-power inference on specialized hardware. I will consider the problem of building small-footprint non- linear models based on kernel methods and deep learning techniques, for on-device deployments. Towards this end, I will give an overview of various techniques, and introduce new notions of parsimony rooted in the theory of structured matrices. Such structured matrices can be used to recycle Gaussian random vectors in order to build randomized feature maps in sub-linear time for approximating various kernel functions. In the deep learning context, low-displacement structured parameter matrices admit fast function and gradient evaluation. I will discuss how such compact nonlinear transforms span a rich range of parameter sharing configurations whose statistical modeling capacity can be explicitly tuned along a continuum from structured to unstructured. I will present empirical results on mobile speech recognition problems, and image classification tasks. I will also briefly present some basics of TensorFlow: a open-source library for numerical computations on data flow graphs. Tensorflow enables large-scale distributed training of complex machine learning models, and their rapid deployment on mobile devices. Bio: Vikas Sindhwani is Research Scientist in the Google Brain team in New York City.
    [Show full text]
  • Google Pagerank and Markov Chains
    COMP4121 Lecture Notes The PageRank, Markov chains and the Perron Frobenius theory LiC: Aleks Ignjatovic [email protected] THE UNIVERSITY OF NEW SOUTH WALES School of Computer Science and Engineering The University of New South Wales Sydney 2052, Australia Topic One: the PageRank Please read Chapter 3 of the textbook Networked Life, entitled \How does Google rank webpages?"; other references on the material covered are listed at the end of these lecture notes. 1 Problem: ordering webpages according to their importance Setup: Consider all the webpages on the entire WWW (\World Wide Web") as a directed graph whose nodes are the web pages fPi : Pi 2 WWW g, with a directed edge Pi ! Pj just in case page Pi points to page Pj, i.e. page Pi has a link to page Pj. Problem: Rank all the webpages of the WWW according to their \importance".1 Intuitively, one might feel that if many pages point to a page P0, then P0 should have a high rank, because if one includes on their webpage a link to page P0, then this can be seen as their recommendation of P0. However, this is not a good criterion for several reasons. For example, it can be easily manipulated to increase the rating of any webpage P0, simply by creating a lot of silly web pages which just point to P0; also, if a webpage \generously" points to a very large number of webpages, such \easy to get" recommendation is of dubious value. Thus, we need to refine our strategy how to rank webpages according to their \importance", by doing something which cannot be easily manipulated \locally" (i.e., by any group of people who might collude, even if such a group is sizeable).
    [Show full text]
  • BRKIOT-2394.Pdf
    Unlocking the Mystery of Machine Learning and Big Data Analytics Robert Barton Jerome Henry Distinguished Architect Principal Engineer @MrRobbarto Office of the CTAO CCIE #6660 @wirelessccie CCDE #2013::6 CCIE #24750 CWNE #45 BRKIOT-2394 Cisco Webex Teams Questions? Use Cisco Webex Teams to chat with the speaker after the session How 1 Find this session in the Cisco Events Mobile App 2 Click “Join the Discussion” 3 Install Webex Teams or go directly to the team space 4 Enter messages/questions in the team space BRKIOT-2394 © 2020 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 Tuesday, Jan. 28th Monday, Jan. 27th Wednesday, Jan. 29th BRKIOT-2600 BRKIOT-2213 16:45 Enabling OT-IT collaboration by 17:00 From Zero to IOx Hero transforming traditional industrial TECIOT-2400 networks to modern IoT Architectures IoT Fundamentals 08:45 BRKIOT-1618 Bootcamp 14:45 Industrial IoT Network Management PSOIOT-1156 16:00 using Cisco Industrial Network Director Securing Industrial – A Deep Dive. Networks: Introduction to Cisco Cyber Vision PSOIOT-2155 Enhancing the Commuter 13:30 BRKIOT-1775 Experience - Service Wireless technologies and 14:30 BRKIOT-2698 BRKIOT-1520 Provider WiFi at the Use Cases in Industrial IOT Industrial IoT Routing – Connectivity 12:15 Cisco Remote & Mobile Asset speed of Trains and Beyond Solutions PSOIOT-2197 Cisco Innovates Autonomous 14:00 TECIOT-2000 Vehicles & Roadways w/ IoT BRKIOT-2497 BRKIOT-2900 Understanding Cisco's 14:30 IoT Solutions for Smart Cities and 11:00 Automating the Network of Internet Of Things (IOT) BRKIOT-2108 Communities Industrial Automation Solutions Connected Factory Architecture Theory and 11:00 Practice PSOIOT-2100 BRKIOT-1291 Unlock New Market 16:15 Opening Keynote 09:00 08:30 Opportunities with LoRaWAN for IOT Enterprises Embedded Cisco services Technologies IOT IOT IOT Track #CLEMEA www.ciscolive.com/emea/learn/technology-tracks.html Cisco Live Thursday, Jan.
    [Show full text]
  • Google Inc. (Form: 8-K, Filing Date: 08/28/2007)
    SECURITIES AND EXCHANGE COMMISSION FORM 8-K Current report filing Filing Date: 2007-08-28 | Period of Report: 2007-08-27 SEC Accession No. 0001193125-07-190912 (HTML Version on secdatabase.com) FILER Google Inc. Mailing Address Business Address 1600 AMPHITHEATRE 1600 AMPHITHEATRE CIK:1288776| IRS No.: 770493581 | State of Incorp.:DE | Fiscal Year End: 1231 PARKWAY PARKWAY Type: 8-K | Act: 34 | File No.: 000-50726 | Film No.: 071084744 MOUNTAIN VIEW CA 94043 MOUNTAIN VIEW CA 94043 SIC: 7370 Computer programming, data processing, etc. 650 623 4000 Copyright © 2012 www.secdatabase.com. All Rights Reserved. Please Consider the Environment Before Printing This Document UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, DC 20549 FORM 8-K CURRENT REPORT Pursuant to Section 13 or 15(d) of The Securities Exchange Act of 1934 Date of Report (Date of earliest event reported) August 27, 2007 GOOGLE INC. (Exact name of registrant as specified in its charter) Delaware 0-50726 77-0493581 (State or other jurisdiction (Commission File Number) (IRS Employer of incorporation) Identification No.) 1600 Amphitheatre Parkway Mountain View, CA 94043 (Address of principal executive offices, including zip code) (650) 253-0000 (Registrants telephone number, including area code) Not Applicable (Former name or former address, if changed since last report) Check the appropriate box below if the Form 8-K filing is intended to simultaneously satisfy the filing obligation of the registrant under any of the following provisions (see General Instruction A.2. below): ¨ Written communications pursuant to Rule 425 under the Securities Act (17 CFR 230.425) ¨ Soliciting material pursuant to Rule 14a-12 under the Exchange Act (17 CFR 240.14a-12) ¨ Pre-commencement communications pursuant to Rule 14d-2(b) under the Exchange Act (17 CFR 240.14d-2(b)) ¨ Pre-commencement communications pursuant to Rule 13e-4(c) under the Exchange Act (17 CFR 240.13e-4(c)) Copyright © 2012 www.secdatabase.com.
    [Show full text]
  • 1. the Content Score Is the Resulting Score Provided by the Indexing Engine Avail
    INSTITUTO TECNOLÓGICO Y DE ESTUDIOS SUPERIORES DE MONTERREY CAMPUS MONTERREY PROGRAMA DE GRADUADOS EN ELECTRÓNICA, COMPUTACIÓN, INFORMACIÓN Y COMUNICACIONES TECNOLÓGICO DE MONTERREY CONCEPT BASED RANKING IN PERSONAL REPOSITORIES TESIS PRESENTADA COMO REQUISITO PARCIAL PARA OBTENER EL GRADO ACADÉMICO DE: MAESTRÍA EN CIENCIAS EN TECNOLOGÍA INFORMÁTICA POR ALBERTO JOAN SAAVEDRA CHOQUE MONTERREY, N. L., DICIEMBRE 2007 Concept Based Ranking in Personal Repositories by Alberto Joan Saavedra Choque Thesis Presented to the Gradúate Program in Electronics, Computing, Information and Communications School This thesis is a partial requirement to obtain the academic degree of Master of Science in Information Technology Instituto Tecnológico y de Estudios Superiores de Monterrey Campus Monterrey December 2007 A Gary, mi hermano Que me motiva a superarme cada día y a mirar las circunstancias desde diversos puntos de vista. A Yolanda, mi mamá Que me apoya incondicionalmente y me acompaño presencialmente durante un período de mi recorrido en este país que me acogió. A Alberto, mi papá Que me orienta cuando lo necesito y es mi punto de referencia para superar los obstáculos que se presentan en el camino. A Pilar, mi mejor amiga Que conoce mejor que nadie el sacrificio realizado durante esta aventura y me mantuvo enfocado en cumplir exitosamente la trayectoria iniciada. A mi abuelito, tíos y primos Que me hicieron sentir a la distancia, el respaldo y el apoyo de toda la familia. Reconocimientos Al Dr. Juan Carlos Lavariega, mi asesor Por su confianza y apoyo en la realización de esta investigación. Al Dr. Raúl Pérez, sinodal y coordinador de la maestría Por su disponibilidad constante y orientación para responder mis consultas Al Dr.
    [Show full text]
  • Runtime Repair and Enhancement of Mobile App Accessibility
    Runtime Repair and Enhancement of Mobile App Accessibility Xiaoyi Zhang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2018 Reading Committee: James Fogarty, Chair Jacob O. Wobbrock Jennifer Mankoff Program Authorized to Offer Degree: Computer Science & Engineering © Copyright 2018 Xiaoyi Zhang University of Washington Abstract Runtime Repair and Enhancement of Mobile App Accessibility Xiaoyi Zhang Chair of the Supervisory Committee: Professor James Fogarty Computer Science & Engineering Mobile devices and applications (apps) have become ubiquitous in daily life. Ensuring full access to the wealth of information and services they provide is a matter of social justice. Unfortunately, many capabilities and services offered by apps today remain inaccessible for people with disabilities. Built-in accessibility tools rely on correct app implementation, but app developers often fail to implement accessibility guidelines. In addition, the absence of tactile cues makes mobile touchscreens difficult to navigate for people with visual impairments. This dissertation describes the research I have done in runtime repair and enhancement of mobile app accessibility. First, I explored a design space of interaction re-mapping, which provided examples of re-mapping existing inaccessible interactions into new accessible interactions. I also implemented interaction proxies, a strategy to modify an interaction at runtime without rooting the phone or accessing app source code. This strategy enables third-party developers and researchers to repair and enhance mobile app accessibility. Second, I developed a system for robust annotations on mobile app interfaces to make the accessibility repairs reliable and scalable. Third, I built Interactiles, a low-cost, portable, and unpowered system to enhance tactile interaction on touchscreen phones for people with visual impairments.
    [Show full text]
  • International Journal of Software and Web Sciences (IJSWS)
    ISSN (Print): 2279-0063 International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Online): 2279-0071 International Journal of Software and Web Sciences (IJSWS) www.iasir.net An Efficient Crawling Algorithm for Optimization of Web Page for Major Search Engines Sanjeev Dhawan1, Pooja Choudhary2 1Faculty of Computer Science & Engineering, 2Research Scholar Department of Computer Science & Engineering, University Institute of Engineering and Technology, Kurukshetra University, Kurukshetra-136 119, Haryana, INDIA. E-mail(s): [email protected], [email protected] Abstract: Search engine optimization (SEO) is the process of improving the visibility and scope of a website or a web page in search engines' search results. In general, the earlier (or higher ranked on the search results page), and more frequently a site appears in the search results list, the more visitors it will receive from the search engine's users. SEO may target different kinds of search, including image search, local search, video search, academic search news search and industry-specific vertical search engines. As an Internet marketing strategy, SEO considers how search engines work, what people search for, the actual search terms or keywords typed into search engines and which search engines are preferred by their targeted audience. Optimizing a website may involve editing its content and HTML and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. Promoting a site to increase the number of backlinks, or inbound links, is another SEO tactic. This paper proposes an efficient crawling algorithm for indexing and searching from the database.
    [Show full text]
  • Google Benefit from News Content
    Google Benefit from News Content Economic Study by News Media Alliance June 2019 EXECUTIVE SUMMARY: The following study analyzes how Google uses and benefits from news. The main components of the study are: a qualitative overview of Google’s usage of news content, an analysis of news content on Google Search, and an estimate of revenue Google receives from news. I. GOOGLE QUALITATIVE USAGE OF NEWS ▪ News consumption increasingly shifts towards digital (e.g., 93% in U.S. get some news online) ▪ Google has increasingly relied on news to drive consumer engagement with its products ▪ Some examples of Google investment to drive traffic from news include: o Significant algorithmic updates emphasize news in Search results (e.g., 2011 “Freshness” update emphasized more recent search results including news) ▪ Google News keeps consumers in the Google ecosystem; Google makes continual updates to Google News including Subscribe with Google (introduced March 2018) ▪ YouTube increasingly relies on news: in 2017, YouTube added “Breaking News;” in 2018, approximately 20% of online news consumers in the US used YouTube for news ▪ AMPs (accelerated mobile pages) keep consumers in the Google ecosystem II. GOOGLE SEARCH QUANTITATIVE USAGE OF NEWS CONTENT A. Key statistics: ▪ ~39% of results and ~40% of clicks on trending queries are news results ▪ ~16% of results and ~16% of clicks on the “most-searched” queries are news results B. Approach ▪ Scraped the page one of desktop results from Google Search o Daily scrapes from February 8, 2019 to March 4, 2019
    [Show full text]
  • Google: the World's First Information Utility?
    BISE – STATE OF THE ART Google: The World’s First Information Utility? The idea of information utilities came to naught in the 70s and 80s, but the world wide web, the Internet, mobile access, and a plethora of access devices have now made information utilities within reach. Google’s business model, IT infrastructure, and innovation strategies have created the capabilities needed for Google to become the world’s first global information utility. But, the difficulty of monetizing utility services and concerns about privacy, which could bring government regulation, might stymie Google’s growth plans. DOI 10.1007/s12599-008-0011-6 a national scale (Sackmann and Nie 1970; the number of users to the point where it The Authors Sackman and Boehm 1972). Not to be left is the dominant search player in both the out, the idea was also promoted by the U.S. and the world. This IT-based strategy Rex Chen Computer Usage Development Institute has allowed Google to reach more than Prof. Kenneth L. Kraemer, PhD in Japan, the British Post Office in the UK, $16 billion in annual revenue, a stock price Prakul Sharma and Bell Canada and the Telecommunica- over $600 per share and a market capital- University of California, Irvine tions Board in Canada (Press 1974). ization of nearly $200 billion. Center for Research on Information The idea was so revolutionary at the Over the next decade, Google plans to Technology and Organization (CRITO) time that at least one critic called for a extend its existing services to the wire- 5251 California Ave., Ste.
    [Show full text]
  • Securities and Exchange Commission Form S-1
    S-1 1 ds1.htm FORM S-1 Table of Contents As filed with the Securities and Exchange Commission on April 29, 2004 Registration No. 333- SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM S-1 REGISTRATION STATEMENT Under The Securities Act of 1933 GOOGLE INC. (Exact name of Registrant as specified in its charter) Delaware 7375 77 -0493581 (State or other jurisdiction of (Primary Standard Industrial (I.R.S. Employer incorporation or organization) Classification Code Number) Identification Number) 1600 Amphitheatre Parkway Mountain View, CA 94043 (650) 623-4000 (Address, including zip code, and telephone number, including area code, of Registrant’s principal executive offices) Eric Schmidt Chief Executive Officer Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043 (650) 623-4000 (Name, address, including zip code, and telephone number, including area code, of agent for service) Copies to: Larry W. Sonsini, Esq. David C. Drummond, Esq. William H. Hinman, Jr., Esq. David J. Segre, Esq. Jeffery L. Donovan, Esq. Simpson Thacher & Bartlett LLP Wilson Sonsini Goodrich & Rosati, Anna Itoi, Esq. 3330 Hillview Avenue P.C. Google Inc. Palo Alto, California 94304 650 Page Mill Road 1600 Amphitheatre Parkway (650) 251-5000 Palo Alto, California 94304-1050 Mountain View, CA 94043 (650) 493-9300 (650) 623-4000 Approximate date of commencement of proposed sale to the public: As soon as practicable after the effective date of this Registration Statement. If any of the securities being registered on this Form are being offered on a delayed or continuous basis pursuant to Rule 415 under the Securities Act of 1933, as amended (the “Securities Act”), check the following box.
    [Show full text]