Introduction to Computer Science CS 101
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Computer Science CS 101 Ilir Capuni Boston University Plan for today | Internet search | Evaluations What is the web | Internet: the hardware (computers networked) | Web: the software (pages and documents) | HTML (HyperTextMarkupLanguage) z Text z Images z Links z … The need for search | No comment Altavista and Yahoo | Text based | Search for Toyota z You would get links of some celebrity or some nasty and popular page that mentions it | It became useless as the web started growing Google | Different approach Modeling the web Graph | G=(V,E) | Weights on edges to denote importance Representing graph | Matrix, | Linked list | Arrays… A snippet of the web yesterday Task | Having this representation, we would like to rank the pages according their relative importance within the graph | This algorithm does not apply only for the web. Could be used in any collection of entities with reciprocal quotations and references Democracy on the web | Basic idea: Consider a link from page A to page B as a vote by page A for page B | Check also the voter’s reputation: higher its reputation his, his vote weighs more The PageRank | The PageRank is defined recursively and depends on the number and the PageRank of all pages that link to it | It assigns values from 0-10 Is it possible to manipulate | Yes! | Many ways have been invented to manipulate the page rank Are there better algorithms? | Yes there are | HITS by J. Kleinberg (a bit before PageRank and it is referenced in their paper) | IBM Clever | TrustRank Any such things in the history | Yes! | Citation analysis by Eugene Garfield in the 1950s at Upenn developed by Massima Marchiori at the University of Padova The random surfer model | We consider a surfer that starts from one page and hits on the links randomly | PageRank represents the probability that a random surfer will arrive at a particular page What does a Google do | Back end z Crawls the internet z Creates the graph z Analyzes it and updates the rank | Front End z User enters a query z Google analyzes the query z Checks its tables z Output those that have that tag and sorts them according to the PageRank Intentional surfer model | Once you and many programs that you use get addicted to Google, they will start sending information to Google about your habits and web history | Question: do you like this after yesterday’s class? | Question: how come all of a sudden google is offering you as a search or ad result something that is related to an email that you’ve sent to your friend a week ago? Many uses | Impact factor of scientific journals | Word Sense Disambiguation | Optimized web crawling rel=“nofollow” option | In 2005 Google implemented a new value “nofollow” for the rel atrribute of the HTML link and anchor elements | One can mark some links used in his/her page with nofollow to prevent the Google from using it for the PageRank computation Where does the money come from | Advertising industry: bid per click auctions Bing | the listing of search suggestions as queries are entered and a list of related searches (called "Explorer pane") based on semantic technology from Powerset that Microsoft purchased in 2008 | As of January 2010, Bing is the third largest search engine on the web by query volume, at 3.16%, after its competitor Google at 85.35% and Yahoo at 6.15%, according to Net Applications.[ Cleaning up your name.