Introduction to

Jian-hua Yeh (葉建華) [email protected] Lecture Outline

• What are Google’s services? • Inventing Google • Current status of Google • service • Google Office? • iGoogle? • What Google can not do

2 The Google Services

• Web search • Images search • Video search • News search •Maps •Mail • More?

3 10 Cool Things You Can Do With Google 1. Basic Searching

5 Basic Searching Step-by-Step

Select search term(s) Enter search term(s) into search box Click Search or Press Enter key Browse Results

6 2. Advanced Searching

Click on “Advanced Search” on main Google Page

Go to www.googleguide.com for more on how to use Google’s Basic and Advanced Search 7 Better Searches, Better Results

Exact Phrase [“one small step for man”] Excluded Words [bass –fishing, virus -computer] Similar Words [~mobile phone] Multiple Words (or) [Maui OR Hawaii] Multiple Words (and) [vacation Hawaii] ------“I’m feeling lucky” [takes you directly to first web page returned for your query]

8 3. Definitions

“define ______” or “define: ____” Definitions gathered from around the Web

9 Define “Blog”

10 4. Calculator

Addition + Subtraction – Multiplication * Division / Percentages %of Exponents ^

11 “15.99 + 32.50 + 13.25”

12 5. Numbers

Phone #s Tracking #s VIN #s UPC codes Area Codes More…

13 Examples of Number Searches

Phone numbers

Area codes

Tracking packages by #

UPC Codes

VIN #s 14 6. Movies

Showtimes “movies 91360” Reviews Buy Tickets Online

15 7. Stocks

Find reports on specific stocks Compare stocks by entering multiple stock symbols

16 8. Weather

Weather forecasts for specific regions of the world Example: “weather 91360”

17 9. Travel

Airport weather and delays Airline Flight Information

Examples: “lax airport” AND “United 164” 18 10. Pizza!

Find local businesses by typing in a keyword (like “pizza”) and your zipcode

19 More? Yes, there are more… 21 Lecture Outline

• What are Google’s services? • Inventing Google • Current status of Google • GMail service • Google Office? • iGoogle? • What Google can not do

22 Inventing Google Inventing Google

• Sergey & Larry - Ph.D. students at Stanford University •Prototype(1998) – http://google.stanford.edu – 24,000,000 pages (8,058,044,651 today) • Google – “We chose our system name, Google, because it is a common spelling of googol, or 10100 and fits well with our goal of building very large-scale search engines.” • Page Rank – An objective measure of its citation importance that corresponds well with people’s subjective idea of importance.

24 Google’s Mission

“Organize the world’s information and make it universally accessible and useful.”

25 Google’s Goal

“To provide a much higher level of service to all those who seek information, whether they're at a desk in Boston, driving through Bonn, or strolling in Bangkok.”

26 Business Ethics

1. Focus on the user and all else will follow. 2. It's best to do one thing really, really well. 3. Fast is better than slow. 4. Democracy on the web works. 5. You don't need to be at your desk to need an answer. 6. You can make money without doing evil. 7. There's always more information out there. 8. The need for information crosses all borders. 9. You can be serious without a suit. 10. Great just isn't good enough.

27 Inventing Google: Foundation

• PageRank*: – We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d... Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) T C1 1 … A T n Cn 28 *) Inventing Google: Foundation

• Page Rank formula informally – PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) – PageRank can be thought of as a model of user behavior. – We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. – The probability that the random surfer visits a page is its PageRank. • High PR has a page if… – there are many pages that point to it – or if there are some pages that point to it and have a high PR – Note recursive weight propagation through web link structure. – Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one. – Damping factor d is the probability at each page the "random surfer" will get bored and request another random page. • Personalization ☺

29 Inventing Google: Foundation

• PageRank relevancy tuning – Page title –Anchor text –Meta –Font •Size • Weight – Capitalization –…

30 Inventing Google: Anatomy

31 Inventing Google: Anatomy

•URL Server – Providers list of URLs to be fetched to crawlers • Google Crawlers (GoogleBot) – Multiple distributed crawlers • Own DNS cache • 300 connections open at once – Send fetched pages to Store Server – Originally written in Python • Store Server – Compresses and stores to repository. – DOCID is created for each page. • Repository – Stores fetched pages for further processing by Indexer

32 Inventing Google: Anatomy

• Indexer – Reads pages from Repository (uncompress) – Parses each document (Flex on top of own stack): • Page converted to set of Hits (position, font, capitalization, title/achor/meta) / 2B • Added to Document Index – Hits are distributed to Barrels (i.e. one document to multiple barrels) – Every link found in page is stored to Anchors file • Forward and Inverted Barrels (2*64) – Forward Index • Barrel keeps range of Hits sorted by DOCIDs • (DOCID, (WORDID, word’s Hit reference+)+) – Processed by Sorter: • Generates inverted index from forward index – sorts Hits by WORDIDs • Creates (WORDID, offsets) used by Lexicon – Inverted Index (short/full) • (WORDID, (DOCID reference, Hit list reference)+)) • Short: DOCIDs sorted by/contains just quality Hits (word in title, anchor,...); optimal single word search • Full: DOCIDs sorted by DOCID; optimal Hit lists merging i.e. multi-word search • Anchors file – Anchor (from, to, text) • URL Resolver – Reads anchors file: • Relation 2 absolute URL conversion + DOCID assignment • Creates links file • Links file – (url, target: DOCID)

33 Inventing Google: Anatomy

• Searcher uses… – Lexicon • Keeps map saying which Barrel to use. • Originally kept in memory (256MB). – IMHO now must be used something like Multi-level VM Page Table – It is is/was of fixed size (14,000,000 words) – Barrels • Each barrel keeps range of WORDIDs • WORID 2 DOCID map – PageRank pool • Keeps counted page rank for each DOCID – Doc Index • DOCID ordered information about each document – (DOCID, status, repository pointer, checksum, stat, URL, title)

34 Cluster Innards Cluster Innards: Global Google

•Over 30 Google clusters around the world. – DNS based & geo location driven load-balancing: • Domain Name: GOOGLE.COM Registrar: ALLDOMAINS.COM INC. Whois Server: whois.alldomains.com Referral URL: http://www.alldomains.com Name Server: NS2.GOOGLE.COM Name Server: NS1.GOOGLE.COM Name Server: NS3.GOOGLE.COM Name Server: NS4.GOOGLE.COM Status: REGISTRAR-LOCK Updated Date: 03-oct-2002 Creation Date: 15-sep-1997 Expiration Date: 14-sep-2011 • 2005, May 7: Google DNS hack speculations •Total PCs • > 5,000 in 2000 •>15,000 in 2003 • >79,000* in 2004 36 *) I’m not sure about this number, it was taken from an external resource. Cluster Innards: HW

• Basics cluster design insights – Reliability in SW rather then server-class HW. • Commodity PCs used to build high-end computing cluster at a low end prices. • Example: – $287,000 – 176x 2GHz Xeon, 176GB RAM, 7TB HDD – $758,000 – 8x 2GHZ Xeon, 64GB RAM, 8TB HDD – Design is tailored for best aggregate request throughput, not peak server response time – individual request parallelization. • Google has inexpensively built out its computing infrastructure by using thousands of "commodity" servers – <2,000 servers in single cluster. – Dual-processor x86 servers (starting at 533MHz Celeron) with 2-4 GB of memory per machine, 1+ 80GB IDE drive. – Rack: 40-80 of x86-based servers.

37 Cluster Innards: HW

• Optimistically, a consumer PC might crash once in three years from a software glitch or hardware problem. – "At Google scale...if you have thousands of PCs, you can expect one (failure) a day,…" • 1,000,000s not 1,000,000,000s of dollars. – “The trick is to make these racks of hardware work together and to ensure that the failure of one machine doesn't derail an operation.” • Switched Ethernet – Commodity networking hardware is used - typically either 100 megabits/second or 1 gigabit/second at the machine level, but averaging considerably less in overall bisection bandwidth. – Locality optimizations (GFS) 38 Cluster Innards: SW

• Stripped-down version of Linux, which is based on the Red Hat distribution but is really just the operating system kernel modified for Google. • is optimized for handling large blocks of data. – 64MB block – The file system was designed to assume that a failure, such as a failed disk or unplugged network cable, can happen at any time. – Data is replicated in three places, and there is a "master" machine that can locate copies of a piece of data, such as a keyword index, if the original is out of commission. • Google has created "batch" job scheduling software that acts as a sort of taskmaster for millions of operations called the Global Work Queue. • Another important engineering feat done by Google is to make writing programs that run across thousands of servers very straightforward…

39 Lecture Outline

• What are Google’s services? • • Current status of Google • GMail service • Google Office? • iGoogle? • What Google can not do

40 YEAR MONTH EVENT

1995 March and Larry Page meet at a Stanford University spring gathering of Ph.D. computer science candidates. 1996 Jan-Dec Brin and Page create BackRub, the precursor to the engine.

1998 September Google is incorporated and takes up residence in a Menlo Park, Calif., garage with four employees, after Brin and Page put their studies on hold and raise $1 million in funding from family, friends and "angel" investors. 10,000 search queries per day.

1999 Feb-June $25 million in funding from venture capital funds Sequoia Capital and Kleiner Perkins Caufield & Byers; eight employees; Google answers 500,000 searches per day.

2000 May-June Google, answering 18 million search queries a day, becomes the largest search engine on the Web. Internet media company Yahoo picks Google as its default search results provider.

2001 March-April , CEO of Novell and a former chief technology officer at Sun Microsystems, joins Google as chairman.

July-August Schmidt is appointed CEO while Page becomes president, products and Brin becomes president, technology.

September Google announces that it has achieved profitability.

41 2002 Jan- Feb Google announces the availability of “”.

March Google launches a beta version of , which provides news stories from numerous global providers.

Nov. – Dec. Web index now includes 4 billion web documents.

2003 Jan – Feb Google acquires Pyra Labs, creator of the Web self-publishing tool .

May – June Google launches Ad Sense, an advertising program that delivers ads based on the content of Web sites.

2004 March – April Gmail, a free web based email service is launched.

July Google acquires , Inc. a digital photo management company.

August IPO of “GOOG” on NASDAQ at $85 per share, raising $1.7 billion.

November Google search index is now 8 billion pages

2005 March “” is launched.

July GOOG share price passes $300 and becomes the world’s largest media by market value of approximate $85 bn.

42 Strategic Analysis

• Market share in online searches: 56.03% – Who are the competitors?

43 Strategic Analysis

• Market share in online searches: 56.03%

44 Strategic Analysis

• Number of searches a day: 4.03 billion • Web page indexed: 25 billion • Images indexed: 1.3 billion

45 Corporate Now

• Employees: 12000+

46 Financial Success

• Market capitalization: 166 billion USD • Two years after going public, stock is 5-fold • 10.06 billion in revenues in 2006, 3.077 billion profits in 2006

47 Comparison With Yahoo

48 Google Stock Growth vs. Industry vs. DJ

49 Google Competitors

50 51 Acquisitions and Mergers

52 Core Competencies

Google “people” and environment/culture

People have to be extremely intelligent and usually have doctorates; people come into Google with Forward thinking, innovative and “out‐ of ‐the box” strategies.

Search

Quality, popularity, overwhelming awareness of name and what the company is and does.

Google's Brand equity

“Google” is now a verb in Webster’s dictionary.

2003 Most recognized brand of the year.

53 Corporate Culture

54

55 20% Time Philosophy

Spend 20% of their work time on projects that interest them.

Half of new product launches originated from 20% time.

Some of Google's newer services, such as Gmail, Google News, , and AdSense originated from these independent endeavors.

56 So, What Is This?

57 The Answer Is…

58 Lecture Outline

• What are Google’s services? • History of Google • Current status of Google • GMail service • Google Office? • iGoogle? • What Google can not do • How library compete with Google?

59 Cool things you can do with Gmail (gmail.com)

60 From Gmail to….

61

62

63 Google Docs

Revisions

64 Google Docs Revision

65 Google Docs Revision

66 Photos

67 Groups

68 Picture

Your Picture

69 Searching Mail

70 Sending & Receiving Mail

Click here to reply

Auto Save 71 Receiving & Attaching File

72 Receiving PPT & MP3

73 Starred

74 Labels

75 Chatting

76 Chatting

77 That’s just some of the cool things you can do!

78 Lecture Outline

• What are Google’s services? • History of Google • Current status of Google • GMail service • Google Office? • iGoogle? • What Google can not do • How library compete with Google?

79 The Web2.0 Mergers

By To Date/Scale Attribute

Yahoo! Filckr 2005/01, 2M USD Online photos

Yahoo! Del.icio.us 2005/12, N/A Social bookmark

Web-based Google Writely 2006/03, N/A word processing

Google YouTube 2006/10, 1.65B USD Video blog

News MySpace 2005/07, 0.58B USD blog

80 Google Office – What Is Writely?

• Writely is merged by Google in 2006/03 – A web-based word processing service provider – Spelling checking, etc. – MS-Word documents can be processed – Software installation is not necessary

81 Writely-able Environment

Writely can be run on any online Windows or Macintosh computers with one of the following browsers: IE 5.5+ (available on Windows platform only) Mozilla 1.4+ (available on both Mac, Windows and Linux platform) Firefox 1.0.6+ (available on both Mac and Windows and Linux platform)

82 Functions of Writely

• Upload MS-Word documents, HTML pages, or text files. • Create new documents. • Based on WYSIWYG editing style for document formatting and spelling checking. • Share documents with others based on email. • Cooperative document editing online • File revision history, including version rollback. • Publish document publicly, or set permission on document display. • Download documents in MS-Word, HTML or ZIP format. • Publish document to blog.

83 The “Autosave” Feature

• Autosave function automatically performed in Writely, ten seconds a time. – It is quite safe on software or hardware failure.

84 Compare Google office and MS office

85 Google vs. Microsoft

Googel Office Microsoft Office

Gmail & Calendar Outlook

Writely(Google Docs) Word

Google Spreadsheet Excel

Google Base Access

Googel Thumbstacks PowerPoint

Free $350-$499

86 Google vs. Other Web Services

Google.com Yahoo.com MSN.com Groups Yes Yes Yes Picasa Yes Yes Yes Talk Yes Yes Yes Upload Video Yes Yes No Maps Yes Yes Yes News Yes Yes Yes Upload Images Yes Yes No Friends No Yes Yes Knowledge No Yes No Blog Yes Yes Yes Mail Yes Yes Yes Directory Yes Yes Yes Bid No Yes No Shop No Yes No 87 Froogle Yes Yes No Google Spreadsheet

88 Google Office Advantage?

•Security •Privacy • Physical connection quality • Internet quality • Free of charge

How about offline editing?

89 Lecture Outline

• What are Google’s services? • History of Google • Current status of Google • GMail service • Google Office? • iGoogle? • What Google can not do • How library compete with Google?

90 iGoogle: the Personal Organizer Page

91 Considering a POP

• Why do I need a POP? • What is its Purpose? • What content do I want to include? • Who do I want to view my POP? • Where will I publish? • How will I promote it? • How could my learners use one?

92 Building a POP in Google

• Step 1: Open your browser and locate Google www.google.com

93 Getting an account

• Step 2: Select the Sign In icon • Step 3: Create a

94 Accessing your account

• Step 4: Click the sign in icon once more • Step 5: enter your email address and password

95 Personalising your home

• Step 6: Select the Personalised Home icon

96 Adding a tab

• Step 7: Select the Add a tab icon • Step 8: key in a title and click ok • Note: If you leave the tick in place Google will use a typical template for the tab

97 Sample template tab

98 Moving widgets around

1. Select the widget by its title 2. Drag to a new position in your page

99 Editing your bookmarks

1. Select Edit 2. Add a link to your favourite web space 3. Save

100 Expanding a widget

1. Select the + symbol to expand 2. Select the – symbol to contract

101 Adding widgets

Add more widgets to a tab by clicking on Add stuff

102 Adding stuff

Add a widget by clicking on the Add it now icon

103 Check out the new widget

104 Make your iGoogle your home page

105 Lecture Outline

• What are Google’s services? • History of Google • Current status of Google • GMail service • Google Office? • iGoogle? • What Google can not do

106 What Google Can Not Do

• Google is still a traditional search application? – What is traditional search?

107 Traditional Search Principle

108 Traditional Search Principle

109 Traditional Search Principle

110 Google Is Trying to…

• Add shallow linguistics to traditional search

111 But…

112 Semantic Approaches to Search

• Beyond bag-of-words, use terms and concepts instead. • Ontology can help user to: – Formulate semantic query – Refine previous query – Browse concept domain – Formulate related query – Interoperability between search applications – Semantic indexing of documents

113 Ontology in Semantic Exploration

• Use graphical ontologies for query formulation – Semantic annotations of documents – Construct queries graphically – Use ontological structures to expand query – Use ontology to visualize search results

114 Query Formulation

• Queries expanded from ontological structures

115 Query Refinement

• Use ontological structures to explore the domain

116 Ontology-driven Query Interpretation

117 Training Ontology for Search

118 Personalized Ontology

119 Semantic Search Query

120 Conclusion

Is Google good, bad, or evil? 122