How Google Works: Are Search Engines Really Dumb and Should Educators Care?
Paul Barron Director of Library and Archives George C. Marshall Foundation [email protected]
All Right Reserved. This presentation may be copied and distributed for nonprofit educational purposes only. Revised November 2012 We know our students …
“Whereas libraries once seemed like the best answer to the question, Where do I find…? the search engine now rules.”
“No Brief Candle: Preconceiving Research Libraries for the 21st Century;” Part II
Council of Library and Information Resources JEFF STAHLER: (c) Columbus Dispatch Dist. http://www.clir.org/pubs/reports/pub142/pub142.pdf by Newspaper Enterprise Association, Inc
For them, “to Google” is a lifestyle, a habit pattern. Do you agree?
VAASL 2012: Librarians as Leaders 2 Student’s # 1 Online Information Source
Google Google was the go-to resource for almost all of the students in the sample. Nearly all of the students in the sample reported always
using , both for Great! We are Google one level course-related research and above gossip. everyday life research.
“How College Students Seek Information in the Digital Age” http://tinyurl.com/yfp7ol5
VAASL 2012: Librarians as Leaders 3 Should educators be concerned?
“There are consequences to
our students and our educational
system if we [allow] a search
engine to define the parameters
of effective research.”
The University of Google: Education in the (Post) Information Age Tara Brabazon
VAASL 2012: Librarians as Leaders 4 Especially when …
“The prevalence of Google in student research is well- documented, but the Illinois researchers found something they did not expect: students were not very good at using Google.” “They were clueless about how the search engine organizes and displays its results.” “Consequently, the students did not know how to build a search that would return good sources.” What Students Don’t Know Ethnographic Research in Illinois Academic Libraries Project Inside Higher Ed (http://tinyurl.com/3m6yyhp)
VAASL 2012: Librarians as Leaders 5 Why learn how Google works? Because …
“We expect a lot search engines. We ask them vague questions about topics that we are unfamiliar and anticipate a concise organized response.”
“You would have better success if
you laid your head on the keyboard
and coaxed the computer to read your mind.”
Understanding Search Engines: Mathematical Modeling and Text Retrieval Michael W. Berry and Murray Browne
VAASL 2012: Librarians as Leaders 6 If educators hope …
To change students’ excessive use of Google, educators must embrace
Google and learn how the search engine works, in order …
To influence students to integrate Google use with other reliable sources of information.
VAASL 2012: Librarians as Leaders 7 Presentation Objective
Increase our understanding of how search engines and Google work by dispelling search engine myths
Propose a plan to increase the use of library research databases
Not by excluding Google use
Integrate Google use with use of library databases
Goal - Enable us to help our students become better researchers
VAASL 2012: Librarians as Leaders 8 Presentation Objective: Dispel …
Search engine myths: But we’re not equal. I’m understand a searcher’s query, .edu.
treat all sites and domains the I’m .net. same when determining results, and
determine the results based on the popularity of the
site with searchers.
VAASL 2012: Librarians as Leaders 9 Presentation Objective: Dispel …
Search engine myths:
Google accepts payment for ranking a site higher in the search results.
Google removes sites from the database that staff find offensive or when requested by searchers.
VAASL 2012: Librarians as Leaders 10 Myth: Google Accepts “Pay for Ranking”
“At Google we take our commitment to delivering useful and impartial search results very seriously.” “We don’t ever accept payment to add a site to our index, update it more often, or improve its ranking.”
Matt Cutts Head of Google’s Webspam Team http://www.google.com/howgoogleworks
VAASL 2012: Librarians as Leaders 11 To understand how search engines work …
…we must understand, “search engines have no understanding of words or language. (They) don't recognize user intent, can't distinguish goal-oriented search from browsing search.”
A ResourceShelf Interview: 20 Questions with Dr. Gary Flake, Ph.D. Head of Yahoo! Research Labs http://searchenginewatch.com/showPage.html?page=3372051
Thursday, June 3, 2004
VAASL 2012: Librarians as Leaders 12 And in 2010 …
“We can write a computer program to beat the very
best human chess players, but
we can't write a program to
understand a sentence anywhere near the precision
of a child.”
“Helping Computers Understand Language” Steven Baker, Google Software Engineer Official Google Blog January 19, 2010
VAASL 2012: Librarians as Leaders 13 And in 2012 …
“Google has a confession to make: It does
not understand you. Google Fellow Amit Singhal says Google doesn’t understand the question. ‘We cross our fingers
and hope someone on the web
has written about these things or topics.’ ”
Google Knowledge Graph Could Change Search Forever
http://mashable.com/2012/02/13/google-knowledge-graph-change-search/
VAASL 2012: Librarians as Leaders 14 If Google doesn’t understand my query …
… how does Google determine how to select and rank the results in response to my query?
VAASL 2012: Librarians as Leaders 15 What Google Considers on the Webpage
Google’s algorithms rely on more than 200 unique signals to determine a ranking. For example,
how often the search terms occur on the webpage,
if the search terms appear in the title or URL, and
whether synonyms or the search terms occur on the page.
Facts about Google and Competition http://www.google.com/press/competition/howgooglesearchworks.html An Update to our Search Algorithms (8/10/12) http://insidesearch.blogspot.com/2012/08/an-update-to-our-search-algorithms.html
VAASL 2012: Librarians as Leaders 16 What Google Considers Off the Webpage: Links
PageRank
PageRank – A measure of the number and the quality of links to a webpage.
Assumption - Important webpages receive more links from other webpages. Facts about Google and Competition www.google.com/press/competition/howgooglesearchworks.html
VAASL 2012: Librarians as Leaders 17 Question
Okay I understand PageRank but …
VAASL 2012: Librarians as Leaders 18 Matt Cutts of Google states,
“Popularity is different from accuracy and PageRank is different than popularity.”
http://www.youtube.com/watch?v=rNsRpJm3z2g Therefore,Let’s test that PageRank assertion is by different searching from for …accuracy.
VAASL 2012: Librarians as Leaders 19 Search Results
Jew Watch News is the 5th most popular and accurate result for our search.
VAASL 2012: Librarians as Leaders 20 Jew Watch – A Popular & Accurate Site?
VAASL 2012: Librarians as Leaders 21 The Value of Quality Links
“With PageRank, five or six high-quality links from websites would be valued much more highly than twice as many links from less reputable or established sites.”
Librarian Central How does Google collect and rank results? http://www.google.com/librariancenter/articles/0512_01.html
VAASL 2012: Librarians as Leaders 22 Checking the Links to JewWatch.com
VAASL 2012: Librarians as Leaders 23 Law School Links to Jew Watch.com
VAASL 2012: Librarians as Leaders 24 Look at Google’s 3rd Result
VAASL 2012: Librarians as Leaders 25 Google’s Explanation – “subtleties of language”
VAASL 2012: Librarians as Leaders 26 Please explain why Google does not consider …
… the fact that the site is popular with us, the searchers who view the sites!
VAASL 2012: Librarians as Leaders 27 Why not consider searchers’ preferences?
"We believe the approach which relies heavily on an individual's tastes and preferences [to rank results] just doesn't produce the quality and relevant ranking that our algorithms do." Amit Singhal; Google Fellow “This is tough stuff” 25 February 2010 http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html
VAASL 2012: Librarians as Leaders 28 Why!?!
First: “We have all been trained to trust Google and click on the first result.”
“How Google Measures Search Quality” Datawocky http://tinyurl.com/6mpt4u College students trust Google; they click on the number one abstract most of the time, even when the abstracts are less relevant.” In Google We Trust: Users’ Decisions on Rank, Position, and Relevance Laura Granka Journal of Computer-Mediated Communication
VAASL 2012: Librarians as Leaders 29 Trusting Google too Much?
“Second: For informational queries … But we are if a result on page 4, provides better the best results! information than the results on the first three pages, users will not know this result exists!
Therefore, usage behavior does not provide the best feedback on the rankings.”
“How Google Measures Search Quality” Datawocky http://tinyurl.com/6mpt4u
VAASL 2012: Librarians as Leaders 30 And look at the first three results.
“… 100% of participants looked at the top of the page,
85% looked at the bottom listing. Anything below the fold dropped dramatically to
50% at the top and a lowly Image Date - 14th November 2011 20% at the bottom.” Eye Tracking Web Usability Study Reveals the “Golden Triangle” June 14, 2010 http://tinyurl.com/5tj4mqw
VAASL 2012: Librarians as Leaders 31 Google Gullibility
“Many users are at the search engine's mercy and mainly click the top links — a behavior [called] Google Gullibility. Sadly, while these top links are often not what they really need, users don't know how to do better.”
Jakob Nielsen's Alertbox, February 4, 2008 User Skills Improving, But Only Slightly http://www.useit.com/alertbox/user-skills.html
VAASL 2012: Librarians as Leaders 32 Consider this …
“The computer screen is … literally
a small thing [that] may display just
over 300 words. If this world becomes
our reality, we actually are relying on
less information, not the more that is available.”
“The Google-ization of Knowledge” Natasja Larson, Laura Servage, and Jim Parsons ; Faculty of Education; University of Alberta http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/28/03/99.pdf
VAASL 2012: Librarians as Leaders 33 Google doesn’t need to consider …
… the popularity of a website with searchers because their algorithm is so up-to-date that Google always RIGHT! returns the best results. Right?
RIGHT!
VAASL 2012: Librarians as Leaders 34 Relevance in Google = Only an Opinion
Google’s … “assessments of the "value" of a web page are subjectively-determined [by] formulae to come up with a ranking. PageRanks are opinions. They're professional opinions, but they remain opinions.”
“Google Replies to SearchKing Lawsuit” Google v. http://research.yale.edu/lawmeme/ Thursday, January 9, 2003
VAASL 2012: Librarians as Leaders 35 Google's rankings are protected opinion.
"The court simply finds there is no conceivable way to prove that the relative significance assigned to a given
Web site is false. Accordingly, the court concludes
Google's PageRanks are entitled to full constitutional protection.” “Judge Dismisses Suit Against Google” http://news.cnet.com/2100-1032_3-1011740.html May 30, 2003
VAASL 2012: Librarians as Leaders 36 Evaluating Google’s Opinion
Google returns all sites with the phrase, martin luther king.
VAASL 2012: Librarians as Leaders 37 Google’s 4th Result as of 10-10-2012
VAASL 2012: Librarians as Leaders 38 Martin Luther King.org Homepage
VAASL 2012: Librarians as Leaders 39 Martin Luther King.org is hosted by …
VAASL 2012: Librarians as Leaders 40 The student wants to know …
Why was that site returned as the 4th result among the 89.3 million results!?!
I thought Google and other search engines always returned the best results.
VAASL 2012: Librarians as Leaders 41 Checking for .edu Links to the Webpage
Remember the importance of PageRank which measures the number and quality of links to a webpage.
Link Check – Returns results that are linked to a site; for example, .edu sites that are linked to Martin Luther King.org.
VAASL 2012: Librarians as Leaders 42 Link Check Results
QUESTION By reviewing the webpage description can you determine the purpose of the .edu sites linking to Martin Luther King.org?
VAASL 2012: Librarians as Leaders 43 Links from .edu & .gov Sites = Trust
“Google places a heavy bias on informational resources; .edu and .gov sites tend to rank higher than others.”
“Google is the best at determining true link quality and places a lot of weight on domain trust levels.”
“Can You Please Them All?” http://www.bruceclay.com/blog/archives/2006/08/can_you_please.html
VAASL 2012: Librarians as Leaders 44 Will Google remove MLK.org?
“The beliefs and preferences of those who work at Google, [and] the opinions of the
general public, do not determine or impact our search results. [W]e do not remove a page from our results because its content is unpopular or because we receive complaints concerning it.” “An explanation of our search results” http://www.google.com/explanation.html
VAASL 2012: Librarians as Leaders 45 Google‘s opinion is important; …
What can I do to influence the results returned by Google?
VAASL 2012: Librarians as Leaders 46 Question.
Search Engine Components
Spider/Web Crawler/Robot
Index
Search Engine
The only feature that you can control is the query entered into the search engine.
VAASL 2012: Librarians as Leaders 47 Keyword Searching
“Keyword-based search works well if the users know exactly what they want and formulate queries with the “right words.” Let’s go see the librarian. “It does not help much and is sometimes even hopeless if the users only have vague concepts about what they are asking.” Toward Topic Search on the Web Microsoft Research March 2011 http://research.microsoft.com/apps/pubs/default.aspx?id=145837
VAASL 2012: Librarians as Leaders 48 Searching With the Fewest Words as Possible
“There's a real imbalance in 81% of search engine queries Web search. Users give us three are 4 words or less. words at a time. People type the query "map," and then they get upset if it's not the map they were thinking of.”
“The Future of Search: The head of Google Research talks about his group's projects.” MIT Technology Review http://tinyurl.com/2pmfsu
VAASL 2012: Librarians as Leaders 49 And Never Mention the Topic
“We find that searchers turn so quickly to
Google that they don't think about what they're searching for. It's surprising, we'll see people trying to find out something about a topic, but never mention the topic.”
The Art of the Field Study http://googleblog.blogspot.com/2008/11/art-of-field-study.html
VAASL 2012: Librarians as Leaders 50 Queries by Middle School Students
“A predominate difficulty students experience while performing
Web-based research is constructing effective search strings.”
“[M]iddle school students demonstrate
unsophisticated skills when constructing
search strings, using mainly broad terms
and phrases.”
“Internet Searching by K-12 Students: A Research-based Process Model” http://eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/1b/a8/26.pdf
VAASL 2012: Librarians as Leaders 51 Queries by High School Students
“ [H]igh school students
struggle with conceptualizing
the topic for their query, sometimes omitting
required concepts.”
“Internet Searching by K-12 Students: A Research-based Process Model” http://eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/1b/a8/26.pdf
VAASL 2012: Librarians as Leaders 52 Queries by College Students
“[S]earch engines generally performed poorly, a lack of
computer skills and an inability to construct appropriate search statements limited [college]
students' success.”
Nowicki, Stacy. Student vs. Search Engine: Undergraduates Rank Results for Relevance portal: Libraries and the Academy - Volume 3, Number 3, July 2003
VAASL 2012: Librarians as Leaders 53 Can students craft a four-word query on …
… the effects on climate change on global temperatures and sea levels worldwide?
VAASL 2012: Librarians as Leaders 54 Why are WIKIPEDIA results ranked so highly?
Why Search? Just Take Me to WikipediA.
Is there a preference for
WIKIPEDIA or other top level domains?
VAASL 2012: Librarians as Leaders 55 Yes!
“Google’s authority-based algorithm is domain-centric. Google has focused on domain-trusting by pushing to the top of the results massive sites like Wikipedia that couldn’t have been created by spammers.”
The Google Cache Google’s New Algorithm: if($domain==’wikipedia.org’){$rank=1;} http://tinyurl.com/yv3xo6
VAASL 2012: Librarians as Leaders 56 WIKIPEDIA – A Most Important Domain
VAASL 2012: Librarians as Leaders 57 But this is not the answer!
VAASL 2012: Librarians as Leaders 58 The Remedy
“Given their popularity, knowing more about
I understand Google; I’ll search for search engines is vital to title pages with domain-limited understanding information Boolean queries. access in a digital age.”
The social, political, economic, and cultural dimensions of search engines: An introduction. Hargittai, E., (2007). Journal of Computer-Mediated Communication, 12(3), article 1. http://jcmc.indiana.edu/vol12/issue3/hargittai.html
VAASL 2012: Librarians as Leaders 59 Teach Students Advanced Search Syntaxes
“Advanced syntax users demonstrate search expertise
that the majority of user population does not. They are:
more adept at combining query operators to
formulate powerful query statements and
return more relevant results
Not only were they more successful in their searching,
they were consistently more successful.”
Investigating the Querying and Browsing Behavior of Advanced Search Engine Users research.microsoft.com/~ryenw/papers/WhiteSIGIR2007b.pdf
VAASL 2012: Librarians as Leaders 60 Searching: An Aid to Complex Reasoning
Brain Activity from Internet Search The image on the left displays brain activity while reading a book; the image on the right displays activity while engaging in an Internet search.
“Internet searching engages complicated brain activity which may help improve brain function. [The] Web-savvy group registered activity in the areas of the brain which control decision-making and complex reasoning.”
“UCLA study finds that searching the Internet increases brain function” November 15, 2008 http://newsroom.ucla.edu/portal/ucla/ucla-study-finds-that-searching-64348.aspx VAASL 2012: Librarians as Leaders 61 Google Advanced Search Syntax Query
intitle:“climate change” AND “global temperatures” AND “sea levels” AND site:gov
This query will find results: from a .gov website, with climate change in the title of the webpage, and the phrases “global temperatures” AND “sea levels” on the webpage.
VAASL 2012: Librarians as Leaders 62
Google Advanced Search Syntax Results
NOTE All results are .gov sites, with “climate change” in the titles, and the terms, “global temperatures” and “sea levels” in the webpage descriptions.
VAASL 2012: Librarians as Leaders 63 From Google to the Library Databases
Demonstrate that Google syntaxes and queries,
with minor modification work in the proprietary
databases and may provide more relevant sources.
VAASL 2012: Librarians as Leaders 64 From Google to a Library Database
Students can limit the search to the structure of the document.
VAASL 2012: Librarians as Leaders 65 Academic Journals
VAASL 2012: Librarians as Leaders 66 It works - minds can be changed!
“I modeled my presentation, ‘Internet Privacy Laws are
Necessary,’ for juniors (AP English 11) after the VEMA presentation.
“They searched their usual way. Then I demonstrated a search using advanced search techniques. Then they searched using advanced search syntaxes.”
“They narrowed their search results from one million to 55 and it was amazing how many hits were on target.”
Nancy Keenan; Library Media Specialist/Computer Coordinator Glenvar High School Salem, Virginia
VAASL 2012: Librarians as Leaders 67 Minds can be changed!
“I revised my lesson plan for teaching students how to search the
Web and library databases. Students were
frustrated using the Web; when we got to
Gale and ABC-CLIO their amazement in the difference of the quality of
information was priceless. One student researching working women of
the 1930s said, ‘I found much more in Student Resource Center.’ ”
Lori Donovan, NBCT, Teacher-Librarian Thomas Dale High School Chester, Virginia
VAASL 2012: Librarians as Leaders 68 The Importance of “Friends!”
Gang, the librarian is better than Google!
http://www.oclc.org/reports/2005perceptions.htm
VAASL 2012: Librarians as Leaders 69 And the winner is …
VAASL 2012: Librarians as Leaders 70