How Does Google Work? Google As a Search Engine
How does Google work? Google as a search engine
Ethical Forum University Foundation November 2009
Vincent Blondel Louvain School of Engineering, UCL «Don’t be evil»
1998 Larry Page and Sergey Brin
1996: Research project, by Larry Page and Sergey Brin at Stanford University
1998: Google Inc. company. 25 million webpages indexed.
2000: One billion webpages indexed. 2009 2004: Google goes public.
2005: One billion images.
2006: "to google" added to the Oxford English Dictionary.
PageRank 6
PageRank 5
Google employs a number of techniques to improve search quality including PageRank, anchor text, and proximity information. 6 www.google.be PageRank 10
www.uclouvain.be PageRank 9
www.kbr.be PageRank 8
www.fnrs.be PageRank 7
www.fondationuniversitaire.be PageRank 6
The web: a democracy of links http://kvina.niva.no/booking/ The web: a democracy of links
The web: a democracy of links 15 23 billions webpages
http://www.worldwidewebsize.com/ PageRank democracy: let the links vote
You webpage inherit a high PageRank if it is being pointed by pages that themselves have a high PageRank. How is this done?
• Frequent updates (Googlebot)
• Storage 36 data centers (19 in the U.S., 12 in Europe). About 450.000 computers
• Sophisticated distributed computation To be googeable or not to be
Golden triangle
• Position 1: 100% • Position 2: 100% • Position 3: 100% • Position 4: 85% • Position 5: 60% • Position 6: 50% • Position 7: 50% • Position 8: 30% • Position 9: 30% • Position 10: 20% Genesis 1:1 "In the beginning God created the heaven and the earth."
Organic Search Organic Results Paid for by the website owners. Sponsored Links
Google ranking robustness
From: *** From: *** Date: December 16, 2006 1:20:56 PM CST Date: December 18, 2006 6:29 To: *** • Google changes theTo: Vincent algorithm Blondel
------• Google bombing P. R. Kumar Franklin W. Woeltge Professor of Electrical and Computer Engineering, and Research Professor,Google: Coordinated «miserable Science Lab failure» • Buy links 26 Email from a UCL colleague (sometime in 2007).
«Un jour, j'ai besoin du CV du recteur de l'UCL pour la soumission d'un projet. J'utilise Google et je tape "Bernard Coulie". Sans faire attention, j'entre en fait "BernardCoulie", sans espace, et je tombe sur deux liens internes à l'UCL. C'étaient deux fichiers qui donnaient les salaires de tous les membres de l'université. Par accident, ces fichiers étaient mal protégés et accessibles à tous.
J'avertis les informaticiens. Alerte générale. Durant la nuit tout est réparé.
Restaient les caches pour lesquels il a fallu contacter Google ainsi que les autres moteurs de recherche afin que l’information disparaisse totalement du web.»
Two months ago Two weeks ago Two days ago
2356300965432378987 x 577810098750098318 = ?