Introduction to Computer Science CS 101

Introduction to Computer Science CS 101

Introduction to Computer Science CS 101 Ilir Capuni Boston University Plan for today | Internet search | Evaluations What is the web | Internet: the hardware (computers networked) | Web: the software (pages and documents) | HTML (HyperTextMarkupLanguage) z Text z Images z Links z … The need for search | No comment Altavista and Yahoo | Text based | Search for Toyota z You would get links of some celebrity or some nasty and popular page that mentions it | It became useless as the web started growing Google | Different approach Modeling the web Graph | G=(V,E) | Weights on edges to denote importance Representing graph | Matrix, | Linked list | Arrays… A snippet of the web yesterday Task | Having this representation, we would like to rank the pages according their relative importance within the graph | This algorithm does not apply only for the web. Could be used in any collection of entities with reciprocal quotations and references Democracy on the web | Basic idea: Consider a link from page A to page B as a vote by page A for page B | Check also the voter’s reputation: higher its reputation his, his vote weighs more The PageRank | The PageRank is defined recursively and depends on the number and the PageRank of all pages that link to it | It assigns values from 0-10 Is it possible to manipulate | Yes! | Many ways have been invented to manipulate the page rank Are there better algorithms? | Yes there are | HITS by J. Kleinberg (a bit before PageRank and it is referenced in their paper) | IBM Clever | TrustRank Any such things in the history | Yes! | Citation analysis by Eugene Garfield in the 1950s at Upenn developed by Massima Marchiori at the University of Padova The random surfer model | We consider a surfer that starts from one page and hits on the links randomly | PageRank represents the probability that a random surfer will arrive at a particular page What does a Google do | Back end z Crawls the internet z Creates the graph z Analyzes it and updates the rank | Front End z User enters a query z Google analyzes the query z Checks its tables z Output those that have that tag and sorts them according to the PageRank Intentional surfer model | Once you and many programs that you use get addicted to Google, they will start sending information to Google about your habits and web history | Question: do you like this after yesterday’s class? | Question: how come all of a sudden google is offering you as a search or ad result something that is related to an email that you’ve sent to your friend a week ago? Many uses | Impact factor of scientific journals | Word Sense Disambiguation | Optimized web crawling rel=“nofollow” option | In 2005 Google implemented a new value “nofollow” for the rel atrribute of the HTML link and anchor elements | One can mark some links used in his/her page with nofollow to prevent the Google from using it for the PageRank computation Where does the money come from | Advertising industry: bid per click auctions Bing | the listing of search suggestions as queries are entered and a list of related searches (called "Explorer pane") based on semantic technology from Powerset that Microsoft purchased in 2008 | As of January 2010, Bing is the third largest search engine on the web by query volume, at 3.16%, after its competitor Google at 85.35% and Yahoo at 6.15%, according to Net Applications.[ Cleaning up your name.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us