How Works: Are Search Engines Really Dumb and Should Educators Care?

Paul Barron Director of Library and Archives George C. Marshall Foundation [email protected]

All Right Reserved. This presentation may be copied and distributed for nonprofit educational purposes only. Revised November 2012 We know our students …

“Whereas libraries once seemed like the best answer to the question, Where do I find…? the search engine now rules.”

“No Brief Candle: Preconceiving Research Libraries for the 21st Century;” Part II

Council of Library and Information Resources JEFF STAHLER: (c) Columbus Dispatch Dist. http://www.clir.org/pubs/reports/pub142/pub142.pdf by Newspaper Enterprise Association, Inc

For them, “to Google” is a lifestyle, a habit pattern. Do you agree?

VAASL 2012: Librarians as Leaders 2 Student’s # 1 Online Information Source

Google Google was the go-to resource for almost all of the students in the sample. Nearly all of the students in the sample reported always

using , both for Great! We are level course-related research and above gossip. everyday life research.

“How College Students Seek Information in the Digital Age” http://tinyurl.com/yfp7ol5

VAASL 2012: Librarians as Leaders 3 Should educators be concerned?

“There are consequences to

our students and our educational

system if we [allow] a search

engine to define the parameters

of effective research.”

The University of Google: Education in the (Post) Information Age Tara Brabazon

VAASL 2012: Librarians as Leaders 4 Especially when …

“The prevalence of Google in student research is well- documented, but the Illinois researchers found something they did not expect: students were not very good at using Google.” “They were clueless about how the search engine organizes and displays its results.” “Consequently, the students did not know how to build a search that would return good sources.” What Students Don’t Know Ethnographic Research in Illinois Academic Libraries Project Inside Higher Ed (http://tinyurl.com/3m6yyhp)

VAASL 2012: Librarians as Leaders 5 Why learn how Google works? Because …

“We expect a lot search engines. We ask them vague questions about topics that we are unfamiliar and anticipate a concise organized response.”

“You would have better success if

you laid your head on the keyboard

and coaxed the computer to read your mind.”

Understanding Search Engines: Mathematical Modeling and Text Retrieval Michael W. Berry and Murray Browne

VAASL 2012: Librarians as Leaders 6 If educators hope …

 To change students’ excessive use of Google, educators must embrace

Google and learn how the search engine works, in order …

 To influence students to integrate Google use with other reliable sources of information.

VAASL 2012: Librarians as Leaders 7 Presentation Objective

 Increase our understanding of how search engines and Google work by dispelling search engine myths

 Propose a plan to increase the use of library research databases

 Not by excluding Google use

 Integrate Google use with use of library databases

 Goal - Enable us to help our students become better researchers

VAASL 2012: Librarians as Leaders 8 Presentation Objective: Dispel …

 Search engine myths: But we’re not equal. I’m  understand a searcher’s query, .edu.

 treat all sites and domains the I’m .net. same when determining results, and

 determine the results based on the popularity of the

site with searchers.

VAASL 2012: Librarians as Leaders 9 Presentation Objective: Dispel …

 Search engine myths:

Google accepts payment for ranking a site higher in the search results.

Google removes sites from the database that staff find offensive or when requested by searchers.

VAASL 2012: Librarians as Leaders 10 Myth: Google Accepts “Pay for Ranking”

“At Google we take our commitment to delivering useful and impartial search results very seriously.” “We don’t ever accept payment to add a site to our index, update it more often, or improve its ranking.”

Matt Cutts Head of Google’s Webspam Team http://www.google.com/howgoogleworks

VAASL 2012: Librarians as Leaders 11 To understand how search engines work …

…we must understand, “search engines have no understanding of words or language. (They) don't recognize user intent, can't distinguish goal-oriented search from browsing search.”

A ResourceShelf Interview: 20 Questions with Dr. Gary Flake, Ph.D. Head of Yahoo! Research Labs http://searchenginewatch.com/showPage.html?page=3372051

Thursday, June 3, 2004

VAASL 2012: Librarians as Leaders 12 And in 2010 …

“We can write a computer program to beat the very

best human chess players, but

we can't write a program to

understand a sentence anywhere near the precision

of a child.”

“Helping Computers Understand Language” Steven Baker, Google Software Engineer Official Google Blog January 19, 2010

VAASL 2012: Librarians as Leaders 13 And in 2012 …

“Google has a confession to make: It does

not understand you. Google Fellow Amit Singhal says Google doesn’t understand the question. ‘We cross our fingers

and hope someone on the web

has written about these things or topics.’ ”

Google Knowledge Graph Could Change Search Forever

http://mashable.com/2012/02/13/google-knowledge-graph-change-search/

VAASL 2012: Librarians as Leaders 14 If Google doesn’t understand my query …

… how does Google determine how to select and rank the results in response to my query?

VAASL 2012: Librarians as Leaders 15 What Google Considers on the Webpage

 Google’s algorithms rely on more than 200 unique signals to determine a ranking. For example,

 how often the search terms occur on the webpage,

 if the search terms appear in the title or URL, and

 whether synonyms or the search terms occur on the page.

Facts about Google and Competition http://www.google.com/press/competition/howgooglesearchworks.html An Update to our Search Algorithms (8/10/12) http://insidesearch.blogspot.com/2012/08/an-update-to-our-search-algorithms.html

VAASL 2012: Librarians as Leaders 16 What Google Considers Off the Webpage: Links

 PageRank

PageRank – A measure of the number and the quality of links to a webpage.

Assumption - Important webpages receive more links from other webpages. Facts about Google and Competition www.google.com/press/competition/howgooglesearchworks.html

VAASL 2012: Librarians as Leaders 17 Question

 Okay I understand PageRank but …

VAASL 2012: Librarians as Leaders 18 of Google states,

“Popularity is different from accuracy and PageRank is different than popularity.”

http://www.youtube.com/watch?v=rNsRpJm3z2g Therefore,Let’s test that PageRank assertion is by different searching from for …accuracy.

VAASL 2012: Librarians as Leaders 19 Search Results

Jew Watch News is the 5th most popular and accurate result for our search.

VAASL 2012: Librarians as Leaders 20 Jew Watch – A Popular & Accurate Site?

VAASL 2012: Librarians as Leaders 21 The Value of Quality Links

“With PageRank, five or six high-quality links from websites would be valued much more highly than twice as many links from less reputable or established sites.”

Librarian Central How does Google collect and rank results? http://www.google.com/librariancenter/articles/0512_01.html

VAASL 2012: Librarians as Leaders 22 Checking the Links to JewWatch.com

VAASL 2012: Librarians as Leaders 23 Law School Links to Jew Watch.com

VAASL 2012: Librarians as Leaders 24 Look at Google’s 3rd Result

VAASL 2012: Librarians as Leaders 25 Google’s Explanation – “subtleties of language”

VAASL 2012: Librarians as Leaders 26 Please explain why Google does not consider …

… the fact that the site is popular with us, the searchers who view the sites!

VAASL 2012: Librarians as Leaders 27 Why not consider searchers’ preferences?

"We believe the approach which relies heavily on an individual's tastes and preferences [to rank results] just doesn't produce the quality and relevant ranking that our algorithms do." Amit Singhal; Google Fellow “This is tough stuff” 25 February 2010 http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html

VAASL 2012: Librarians as Leaders 28 Why!?!

First: “We have all been trained to trust Google and click on the first result.”

“How Google Measures Search Quality” Datawocky http://tinyurl.com/6mpt4u College students trust Google; they click on the number one abstract most of the time, even when the abstracts are less relevant.” In Google We Trust: Users’ Decisions on Rank, Position, and Relevance Laura Granka Journal of Computer-Mediated Communication

VAASL 2012: Librarians as Leaders 29 Trusting Google too Much?

“Second: For informational queries … But we are if a result on page 4, provides better the best results! information than the results on the first three pages, users will not know this result exists!

Therefore, usage behavior does not provide the best feedback on the rankings.”

“How Google Measures Search Quality” Datawocky http://tinyurl.com/6mpt4u

VAASL 2012: Librarians as Leaders 30 And look at the first three results.

“… 100% of participants looked at the top of the page,

85% looked at the bottom listing. Anything below the fold dropped dramatically to

50% at the top and a lowly Image Date - 14th November 2011 20% at the bottom.” Eye Tracking Web Usability Study Reveals the “Golden Triangle” June 14, 2010 http://tinyurl.com/5tj4mqw

VAASL 2012: Librarians as Leaders 31 Google Gullibility

“Many users are at the search engine's mercy and mainly click the top links — a behavior [called] Google Gullibility. Sadly, while these top links are often not what they really need, users don't know how to do better.”

Jakob Nielsen's Alertbox, February 4, 2008 User Skills Improving, But Only Slightly http://www.useit.com/alertbox/user-skills.html

VAASL 2012: Librarians as Leaders 32 Consider this …

“The computer screen is … literally

a small thing [that] may display just

over 300 words. If this world becomes

our reality, we actually are relying on

less information, not the more that is available.”

“The Google-ization of Knowledge” Natasja Larson, Laura Servage, and Jim Parsons ; Faculty of Education; University of Alberta http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/28/03/99.pdf

VAASL 2012: Librarians as Leaders 33 Google doesn’t need to consider …

… the popularity of a website with searchers because their algorithm is so up-to-date that Google always RIGHT! returns the best results. Right?

RIGHT!

VAASL 2012: Librarians as Leaders 34 Relevance in Google = Only an Opinion

Google’s … “assessments of the "value" of a web page are subjectively-determined [by] formulae to come up with a ranking. PageRanks are opinions. They're professional opinions, but they remain opinions.”

“Google Replies to SearchKing Lawsuit” Google v. http://research.yale.edu/lawmeme/ Thursday, January 9, 2003

VAASL 2012: Librarians as Leaders 35 Google's rankings are protected opinion.

"The court simply finds there is no conceivable way to prove that the relative significance assigned to a given

Web site is false. Accordingly, the court concludes

Google's PageRanks are entitled to full constitutional protection.” “Judge Dismisses Suit Against Google” http://news.cnet.com/2100-1032_3-1011740.html May 30, 2003

VAASL 2012: Librarians as Leaders 36 Evaluating Google’s Opinion

Google returns all sites with the phrase, martin luther king.

VAASL 2012: Librarians as Leaders 37 Google’s 4th Result as of 10-10-2012

VAASL 2012: Librarians as Leaders 38 Martin Luther King.org Homepage

VAASL 2012: Librarians as Leaders 39 Martin Luther King.org is hosted by …

VAASL 2012: Librarians as Leaders 40 The student wants to know …

Why was that site returned as the 4th result among the 89.3 million results!?!

I thought Google and other search engines always returned the best results.

VAASL 2012: Librarians as Leaders 41 Checking for .edu Links to the Webpage

Remember the importance of PageRank which measures the number and quality of links to a webpage.

 Link Check – Returns results that are linked to a site; for example, .edu sites that are linked to Martin Luther King.org.

VAASL 2012: Librarians as Leaders 42 Link Check Results

QUESTION By reviewing the webpage description can you determine the purpose of the .edu sites linking to Martin Luther King.org?

VAASL 2012: Librarians as Leaders 43 Links from .edu & .gov Sites = Trust

“Google places a heavy bias on informational resources; .edu and .gov sites tend to rank higher than others.”

“Google is the best at determining true link quality and places a lot of weight on domain trust levels.”

“Can You Please Them All?” http://www.bruceclay.com/blog/archives/2006/08/can_you_please.html

VAASL 2012: Librarians as Leaders 44 Will Google remove MLK.org?

“The beliefs and preferences of those who work at Google, [and] the opinions of the

general public, do not determine or impact our search results. [W]e do not remove a page from our results because its content is unpopular or because we receive complaints concerning it.” “An explanation of our search results” http://www.google.com/explanation.html

VAASL 2012: Librarians as Leaders 45 Google‘s opinion is important; …

What can I do to influence the results returned by Google?

VAASL 2012: Librarians as Leaders 46 Question.

 Search Engine Components

 Spider/Web Crawler/Robot

 Index

 Search Engine

 The only feature that you can control is the query entered into the search engine.

VAASL 2012: Librarians as Leaders 47 Keyword Searching

“Keyword-based search works well if the users know exactly what they want and formulate queries with the “right words.” Let’s go see the librarian. “It does not help much and is sometimes even hopeless if the users only have vague concepts about what they are asking.” Toward Topic Search on the Web Microsoft Research March 2011 http://research.microsoft.com/apps/pubs/default.aspx?id=145837

VAASL 2012: Librarians as Leaders 48 Searching With the Fewest Words as Possible

“There's a real imbalance in 81% of search engine queries Web search. Users give us three are 4 words or less. words at a time. People type the query "map," and then they get upset if it's not the map they were thinking of.”

“The Future of Search: The head of Google Research talks about his group's projects.” MIT Technology Review http://tinyurl.com/2pmfsu

VAASL 2012: Librarians as Leaders 49 And Never Mention the Topic

“We find that searchers turn so quickly to

Google that they don't think about what they're searching for. It's surprising, we'll see people trying to find out something about a topic, but never mention the topic.”

The Art of the Field Study http://googleblog.blogspot.com/2008/11/art-of-field-study.html

VAASL 2012: Librarians as Leaders 50 Queries by Middle School Students

“A predominate difficulty students experience while performing

Web-based research is constructing effective search strings.”

“[M]iddle school students demonstrate

unsophisticated skills when constructing

search strings, using mainly broad terms

and phrases.”

“Internet Searching by K-12 Students: A Research-based Process Model” http://eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/1b/a8/26.pdf

VAASL 2012: Librarians as Leaders 51 Queries by High School Students

“ [H]igh school students

struggle with conceptualizing

the topic for their query, sometimes omitting

required concepts.”

“Internet Searching by K-12 Students: A Research-based Process Model” http://eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/1b/a8/26.pdf

VAASL 2012: Librarians as Leaders 52 Queries by College Students

“[S]earch engines generally performed poorly, a lack of

computer skills and an inability to construct appropriate search statements limited [college]

students' success.”

Nowicki, Stacy. Student vs. Search Engine: Undergraduates Rank Results for Relevance portal: Libraries and the Academy - Volume 3, Number 3, July 2003

VAASL 2012: Librarians as Leaders 53 Can students craft a four-word query on …

… the effects on climate change on global temperatures and sea levels worldwide?

VAASL 2012: Librarians as Leaders 54 Why are WIKIPEDIA results ranked so highly?

Why Search? Just Take Me to WikipediA.

 Is there a preference for

WIKIPEDIA or other top level domains?

VAASL 2012: Librarians as Leaders 55 Yes!

“Google’s authority-based algorithm is domain-centric. Google has focused on domain-trusting by pushing to the top of the results massive sites like Wikipedia that couldn’t have been created by spammers.”

The Google Cache Google’s New Algorithm: if($domain==’wikipedia.org’){$rank=1;} http://tinyurl.com/yv3xo6

VAASL 2012: Librarians as Leaders 56 WIKIPEDIA – A Most Important Domain

VAASL 2012: Librarians as Leaders 57 But this is not the answer!

VAASL 2012: Librarians as Leaders 58 The Remedy

“Given their popularity, knowing more about

I understand Google; I’ll search for search engines is vital to title pages with domain-limited understanding information Boolean queries. access in a digital age.”

The social, political, economic, and cultural dimensions of search engines: An introduction. Hargittai, E., (2007). Journal of Computer-Mediated Communication, 12(3), article 1. http://jcmc.indiana.edu/vol12/issue3/hargittai.html

VAASL 2012: Librarians as Leaders 59 Teach Students Advanced Search Syntaxes

“Advanced syntax users demonstrate search expertise

that the majority of user population does not. They are:

 more adept at combining query operators to

formulate powerful query statements and

 return more relevant results

Not only were they more successful in their searching,

they were consistently more successful.”

Investigating the Querying and Browsing Behavior of Advanced Search Engine Users research.microsoft.com/~ryenw/papers/WhiteSIGIR2007b.pdf

VAASL 2012: Librarians as Leaders 60 Searching: An Aid to Complex Reasoning

Brain Activity from Internet Search The image on the left displays brain activity while reading a book; the image on the right displays activity while engaging in an Internet search.

“Internet searching engages complicated brain activity which may help improve brain function. [The] Web-savvy group registered activity in the areas of the brain which control decision-making and complex reasoning.”

“UCLA study finds that searching the Internet increases brain function” November 15, 2008 http://newsroom.ucla.edu/portal/ucla/ucla-study-finds-that-searching-64348.aspx VAASL 2012: Librarians as Leaders 61 Google Advanced Search Syntax Query

intitle:“climate change” AND “global temperatures” AND “sea levels” AND site:gov

 This query will find results:  from a .gov website,  with climate change in the title of the webpage, and the  phrases “global temperatures” AND “sea levels” on the webpage.

VAASL 2012: Librarians as Leaders 62

Google Advanced Search Syntax Results

NOTE All results are .gov sites, with “climate change” in the titles, and the terms, “global temperatures” and “sea levels” in the webpage descriptions.

VAASL 2012: Librarians as Leaders 63 From Google to the Library Databases

 Demonstrate that Google syntaxes and queries,

with minor modification work in the proprietary

databases and may provide more relevant sources.

VAASL 2012: Librarians as Leaders 64 From Google to a Library Database

Students can limit the search to the structure of the document.

VAASL 2012: Librarians as Leaders 65 Academic Journals

VAASL 2012: Librarians as Leaders 66 It works - minds can be changed!

“I modeled my presentation, ‘Internet Privacy Laws are

Necessary,’ for juniors (AP English 11) after the VEMA presentation.

“They searched their usual way. Then I demonstrated a search using advanced search techniques. Then they searched using advanced search syntaxes.”

“They narrowed their search results from one million to 55 and it was amazing how many hits were on target.”

Nancy Keenan; Library Media Specialist/Computer Coordinator Glenvar High School Salem, Virginia

VAASL 2012: Librarians as Leaders 67 Minds can be changed!

“I revised my lesson plan for teaching students how to search the

Web and library databases. Students were

frustrated using the Web; when we got to

Gale and ABC-CLIO their amazement in the difference of the quality of

information was priceless. One student researching working women of

the 1930s said, ‘I found much more in Student Resource Center.’ ”

Lori Donovan, NBCT, Teacher-Librarian Thomas Dale High School Chester, Virginia

VAASL 2012: Librarians as Leaders 68 The Importance of “Friends!”

Gang, the librarian is better than Google!

http://www.oclc.org/reports/2005perceptions.htm

VAASL 2012: Librarians as Leaders 69 And the winner is …

VAASL 2012: Librarians as Leaders 70