<<

Before the Federal Trade Commission Washington, DC 20580

In the Matter of ) ) , Inc. ) ______)

REQUEST FOR INVESTIGATION AND COMPLAINT FOR INJUNCTIVE RELIEF

INTRODUCTION

1. Google, the largest in the United States, has repeatedly touted the numerous ways in which it protects user privacy, particularly with regard to the terms that consumers search for using the company’s search engine. However, the company has consistently designed its services to ensure that these search queries, which often reflect highly sensitive information, are routinely transferred to marketers and other third parties.

2. This complaint concerns the intentional leakage of search query information to third parties by Google. This practice adversely impacts billions of searches conducted by millions of consumers. Google’s sharing of this data is a Deceptive Trade Practice, subject to review by the Federal Trade Commission (the “Commission”) under section 5 of The Federal Trade Commission Act, and should be reversed.

PARTIES

3. Christopher Soghoian is a Washington, DC based Graduate Fellow at the Center for Applied Cybersecurity Research at Indiana University, and a Ph.D. Candidate in the School of Informatics and Computing at Indiana University. His research is focused at the intersection of security, privacy, law and policy. He has previously worked for the Federal Trade Commission, 1 the Berkman Center for Internet and

1 Mr. Soghoian was employed by the Federal Trade Commission between September, 2009 and August, 2010 as a technologist within the Division of Privacy and Identity Protection. During his time at the FTC, Mr Soghoian was prohibited from working on any Google related matters, per a decision by the Office of General Counsel, who determined that Mr. Soghoian’s pre‐FTC academic research and other writings were sufficiently critical of Google to create the possibility of a perception of bias against the company. Mr Soghoian came up with the idea for this complaint, did all the research, and wrote the entire thing himself, in his own time. He has not been instructed to write this complaint by someone else, nor financially compensated for it in any way.

1 Society at Harvard University, The American Civil Liberties Union of Northern California, Google, Apple and IBM Research.

4. Google, Inc. ("Google") was founded in 1998 and is based in Mountain View, California. Google’s headquarters are located at 1600 Amphitheatre Parkway, Mountain View, CA 94043. At all times material to this complaint, Google’ course of business, including the acts and practices alleged herein, has been and is in or affecting commerce, as "commerce" is defined in Section 4 of the Federal Trade Commission Act, 15 U.S.C. § 45.

THE IMPORTANCE OF PRIVACY PROTECTION

5. The right of privacy is a personal and fundamental right in the United States. The privacy of an individual is directly implicated by the collection, use, and dissemination of personal information. The opportunities to secure employment, insurance, credit, to obtain medical services and the rights of due process may be jeopardized by the misuse of personal information.

6. Courts have recognized a privacy interest in the collection of information that concerns Internet use even where the information may not be personally identifiable.

7. The Federal Trade Commission has a statutory obligation to investigate and prosecute violations of Section 5 of the Federal Trade Commission Act where the privacy interests of Internet users are at issue.

STATEMENT OF FACTS

SEARCH ENGINE QUERIES CONTAIN SENSITIVE, DESERVING OF PRIVACY PROTECTIONS

8. Leading thinkers in the privacy community have long argued that consumers “treat the search [engine] box like their most trusted advisors. They tell the box what they wouldn’t tell their own mother, spouse, shrink or priest.”2 Peer reviewed academic studies confirm this fact, particularly regarding the use of search engines to look up sensitive health information.3

2 http://www.theinvestigativefund.org/investigations/rightsliberties/1274/the_cloud_panopticon 3 Gunther Eysenbach and Christian Köhler, “How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in‐depth interviews,” BMJ 2002; 324:573, available at http://www.bmj.com/cgi/content/full/324/7337/573.

2 9. In August 2006, AOL released an “anonymized” dataset of 20 million search queries conducted by 650,000 AOL users over a three month period. The data included search queries revealing names, addresses, local landmarks, medical ailments, credit card numbers and social security numbers. AOL’s management soon apologized for the “screw up,”4 firing the company’s Chief Technology Officer and several other employees.5 AOL’s release of the data also resulted in a FTC complaint from the Electronic Frontier Foundation6 and a class action lawsuit.7

10. Journalists from the New York Times were able to re‐identify individual “anonymized” AOL search users due to the vanity searches they had conducted, and then link other, non‐vanity search queries in the dataset to those individuals through the cross‐session identifiers (cookies) included in the dataset.8

11. While there are several technologies available to consumers to better protect their privacy online, none effectively protect users’ vanity searches.9

12. Soon after the release of the search query data by AOL, Google CEO called AOL's release of user search data "a terrible thing."10

4 Michael Arrington, “AOL: This was a screw up,” TechCrunch, August 7, 2006, available at: http://techcrunch.com/2006/08/07/aol‐this‐was‐a‐screw‐up/. 5 Barry Schwartz, “AOL Fires CTO & Two Employees After Search Records Slip Up,” Search Engine Watch, August 21, 2006, available at: http://blog.searchenginewatch.com/060821‐142810. 6 Electronic Frontier Foundation, Request for investigation and complaint for injunctive relief, August 14, 2006, available at ://w2.eff.org/Privacy/AOL/aol_ftc_complaint_final.pdf. 7 Danny Sullivan, “Class Action Lawsuit Filed Against AOL Over Search Data Release,” Search Engine Watch, September 26, 2006, available at: http://blog.searchenginewatch.com/060926‐075713. 8 Michael Barbaro and Tom Zeller Jr, “A Face is Exposed for AOL Searcher No. 4417749,” The New York Times, August 9, 2006, available at: http://www.nytimes.com/2006/08/09/technology/09aol.html. 9 Christopher Soghoian, “The Problem of Anonymous Vanity Searches,” I/S: A Journal of Law and Policy for the Information Society, Volume 3, Issue 2, 2007, available at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=953673. 10 J. Nicholas Hoover, “AOL Search‐Term Data Was Anonymous, But Not Innocuous”, InformationWeek, August 14, 2006, available at: http://www.informationweek.com/news/software/showArticle.jhtml?articleID=191901983

3 13. In 2006, the Department of Justice sought to compel Google to produce thousands of users’ individual search queries. To its credit, Google fought the government’s request. In a declaration submitted to the court describing the kind of personal information that can end up in the company’s search query logs, Matt Cutts, a Senior Staff Engineer at Google stated:

“There are ways in which a search query alone may reveal personally identifying information. For example, many internet users have experienced the mistake of trying to copy‐and‐paste text into the search query box, only to find that they have pasted something that they did not intended. Because Google allows very long queries, it is possible that a user may paste a fragment of an or a document that would tie the query to a specific person. Users could also enter information such as a credit card, a social security number, an unlisted phone number or some other information that can only be tied to one person. Some people search for their credit card or social security number deliberately in order to check for identity theft or to see if any of their personal information is findable on the Web.”11

INTRODUCTION TO HTTP REFERRER HEADERS

14. When a consumer visits a using their computer or mobile device, every major (Internet Explorer, , Chrome, ) by default reports the last page that the consumer viewed before clicking on a link and visiting the current – that is, the page that “referred” them to the current page. This information is transmitted in the HTTP Referer (sic) header (“referrer header”).12

15. The original technical standard, or Request For Comments (RFC) document that outlines the HTTP specification notes that this header can include private information, and advises web browser designers to include privacy protecting features in their products that will allow users to protect themselves:

“Because the source of a link may be private information or may reveal an otherwise private information source, it is strongly recommended that the user be able to select whether or not the Referer field is sent. For example, a browser client could have a toggle switch for browsing

11 Declaration of Matt Cutts, February 17, 2006, in Gonzales v. Google, 234 F.R.D. 674 (N.D. Cal. 2006) at page 9, available at: http://docfiles.justia.com/cases/federal/district‐ courts/california/candce/5:2006mc80006/175448/14/0.pdf 12 The term “referer” was misspelled in the original technical standards document, and thus, this incorrect spelling is also used in many other technical documents.

4 openly/anonymously, which would respectively enable/disable the sending of Referer and From information.”13

16. Although this 15‐year old technical standard recommends that browser vendors allow users to control the transmission of the referrer header, not all have done so, and none make it easy.

17. Google’s Chrome browser can be configured to not transmit referrer information. Users can enable this feature with an obscure, poorly documented parameter (‐no‐referrers) that must be entered when the browser starts.

Figure 1: Users wishing to disable the chrome referrer header must use a special parameter when starting the application. One way of doing this is from the command line.

18. Mozilla’s Firefox browser also includes a user‐controlled preference that disables the transmission of the header, although it is not easy to use, or discoverable via a menu option. Users of Firefox must first type a special address (about:config) into the location bar, navigate past a scary warning, and then locate the referrer configuration option from one of hundreds available , which they must change from “1” to “2”.

13 T. Berners‐Lee et al., “Hypertext Transfer Protocol ‐‐ HTTP/1.0”, IETF Request for Comments: 1945, May 1996, available at: http://www.ietf.org/rfc/rfc1945.txt.

5

Figure 2: Users of Mozilla Firefox are greeted with scary text if they attempt to modify the advanced browser configuration settings.

Figure 3: Users wishing to disable the referrer header must modify an obscure browser configuration setting from "1" to "2".

6 19. Third party developers have attempted to make it easier for users to protect their privacy online, by creating browser plugins for Firefox and Chrome that can disable the transmission of the referrer header with a single mouse click.14 These third party tools are not widely used, nor prominently advertised by the browser vendors.

20. Apple’s Safari and Microsoft’s Internet Explorer browsers do not include any built‐in referrer disabling functionality. Users of these two browsers make up the majority of the browser market (a combined 65%).15

21. Danny Sullivan, a widely respected search engine industry analyst has written that the http referrer header is “little known to most web surfers,” but that it is “effectively the Caller ID of the internet. It allows web site owners and marketers to know where visitors came from.”16

22. Describing the widespread leakage to third parties of referrer headers and other information about users’ activities online, Professor Ed Felten told the New York Times earlier this year that “[t]he browser needs to be less promiscuous about revealing the information collected.”17

23. The of web pages often include sensitive information, which can be inadvertently leaked to third parties via the browsers’ transmission of the HTTP referrer header. One prominent example of this is the leakage of social network user identifiers, age and gender information to third party advertisers, a problem that was first highlighted by researchers Dr. Balachander Krishnamurthy and Professor Craig E. Wills in 2009.18

14 Rhashemian, “Noref” (Chrome Browser add‐on), June 07, 2010, available at: https://chrome.google.com/extensions/detail/dkpkjedlegmelkogpgamcaemgbanohip. Tito Bouzout, “No Referrer” (Firefox browser add‐on), June 7, 2010, available at: https://addons.mozilla.org/en‐ US/firefox/addon/86093/. 15 Netmarkershare, “Browser Market Share”, August 2010, available at: http://marketshare.hitslink.com/report.aspx?qprid=0. 16 Danny Sullivan, “The Death Of ? An Ode To The Threatened Referrer”, Search Engine Land, May 25, 2010, available at: http://searchengineland.com/the‐death‐of‐web‐analytics‐an‐ode‐to‐ the‐referrer‐42875. 17 Steve Lohr, “Redrawing the Route to Online Privacy,” The New York Times, February 27, 2010, available at: http://www.nytimes.com/2010/02/28/technology/internet/28unbox. 18 B. Krishnamurthy B. and C. E. Wills, “On the leakage of personally identifiable information via online social networks,” SIGCOMM Comput. Commun. Rev. 40, 1 (Jan. 2010), 112‐117, available at: http://conferences.sigcomm.org/sigcomm/2009/workshops/wosn/papers/p7.pdf. Emily Steel and Jessica E. Vascellaro, “, MySpace Confront Privacy Loophole,” The Wall Street Journal, May 21, 2010, available at: http://online.wsj.com/article/SB10001424052748704513104575256701215465596.html. Ben Edelman, “Facebook Leaks Usernames, User IDs, and Personal Details to Advertisers”, May 20, 2010, available at: http://www.benedelman.org/news/052010‐1.html.

7 24. Eight months after being notified of private data leakage issues by these researchers, Facebook deployed a comprehensive system to scrub browser headers of any referrer data when consumers navigate from the company’s web site to a third party. In describing the new scrubbing feature, Facebook Engineer Matt Jones wrote:

“But sometimes referrers just don’t belong – maybe there is sensitive information in a URL, or maybe a site just doesn’t want its users’ browsers telling others how they use the site. While most browsers give their users the option to disable this feature, not everyone does so, and there is no way for a web site to explicitly tell a browser not to send a referrer.

Facebook is one site where referrers don’t really belong. As part of our continued efforts to protect users’ privacy, we proactively protect our users from exposing how they navigated to an external site.”19

GOOGLE’S EMBEDDING OF SEARCH TERMS IN THE RESULTS PAGE URL

25. Since the service’s launch, Google’s search engine has included the search terms in the URL of the search results page. Thus, for example, a search for “abortion clinics in Indianapolis” would return a page with a URL similar to

http://www.google.com/search?q=abortion+clinics+in+Indianapolis

26. Because the search terms were included in the search results URL, when a user clicked on a link from the search results page, the owner of the web site that the user then visited would receive the search terms in the referrer header.

27. Several web analytics services include functionality to automatically parse the search query information from logs, or to otherwise collect the search query from the referrer header transmitted by each visitor’s web browser. For example, Google’s own analytics product provides webmasters with this information at an aggregate level (revealing how many visitors were drawn by particular search terms).

19 Matt Jones, “Protecting Privacy with Referrers,” May 26, 2010, available at: https://www.facebook.com/notes/facebook‐engineering/protecting‐privacy‐with‐ referrers/392382738919.

8

Figure 4: Google’s analytics service, showing aggregate search query information derived from referrer headers (listed here as “keywords”).

THE PRIVACY IMPACT OF GOOGLE’S AJAX ENABLED SEARCH RESULTS PAGE

28. Starting approximately in November 2008, Google began to test a new method of delivering search results that uses advanced AJAX (Asynchronous JavaScript and XML) technologies.20 AJAX is one of the key pillars of the Web 2.0 experience.21 This pilot was initially deployed in the Netherlands,22 but in subsequent months, was observed by users in other countries.

20 Jesse James Garrett, “Ajax: A New Approach to Web Applications “, February 18, 2005, available at: http://www.adaptivepath.com/ideas/essays/archives/000385.php (“Ajax isn’t a technology. It’s really several technologies, each flourishing in its own right, coming together in powerful new ways.”) 21 “AJAX is also a key component of Web 2.0 applications such as Flickr, now part of Yahoo!, 37signals' applications basecamp and backpack, as well as other Google applications such as and .” Tim O’reilly, “What Is Web 2.0 Design Patterns and Business Models for the Next Generation of Software,” September 30, 2005, available at: http://oreilly.com/web2/archive/what‐is‐web‐20.html. 22 Ulco, “Google Search in AJAX?!,” November 19, 2008, available at: http://www.ulco.nl/gibberish/google‐search‐in‐ajax.

9 29. One of the side effects of the AJAX search page is that the URL of the search results page includes the search query terms after a # symbol in the URL. Thus, on an AJAX enabled search page, the URL listed at the top of the page will be similar to:

http://www.google.com/#hl=en&source=hp&q=drug+addiction

30. The addition of the # symbol had a significantly positive, albeit unintentional impact upon user privacy. This is because web browsers do not pass on any information after the # symbol in the referrer header. Thus, using the previous example of a search for the query “drug addiction,” if a user clicked on the first result, the owner of that web site would only receive “http://www.google.com/” in the referrer header, rather than the search terms that follow the # symbol.

31. This change was immediately noticed by the webmaster and search engine optimization community, who complained to Google:

“I'm seeing hundreds of these empty google referers today and wondered what was going on.”23

“This means organic searches from Google will now show up as just http://www.google.com/, with no search parameters. In other words, no analytics app can track these searches anymore. I started noticing lots of hits from just ‘http://www.google.com/’ recently in our own search logs. I thought maybe it was just a bug with Clicky. But then one of our users contacted me about this article, and my jaw about broke from hitting the floor so hard.”24

“What actually breaks if Google makes this switchover, and is in fact broken during any testing they are doing, is much more widespread. Every single analytics package that currently exists, at least as far as being able to track what keywords were searched on to find your site in Google, would no longer function correctly.”25

23 Comment by Sorabji.com, Clicky.blog, Feb 03 2009 1:05pm, available at: http://getclicky.com/blog/150/googles‐new‐ajax‐powered‐search‐results‐breaks‐search‐keyword‐ tracking‐for‐everyone. 24 Clicky.blog, February 03, 2009, available at: http://getclicky.com/blog/150/googles‐new‐ajax‐powered‐ search‐results‐breaks‐search‐keyword‐tracking‐for‐everyone. 25 Michael VanDeMar, “What Will *Really* Break If Google Switches To AJAX…?,” Smackdown! (blog), February 2, 2009, available at: http://smackdown.blogsblogsblogs.com/2009/02/02/what‐will‐really‐ break‐if‐google‐switches‐to‐ajax/.

10 32. Responding to complaints from the webmaster community, Google quickly issued a public statement:

Currently AJAX results are just a test on Google. At this time only a small percentage of users will see this experiment. It is not our intention to disrupt referrer tracking, and we are continuing to iterate on this project and are actively working towards a solution. As we continue experiments, we hope that this test may ultimately provide an easier solution for our customers and a faster experience for our users.26

33. Google soon ended the test of the AJAX search results page, a fact confirmed by Google Senior Engineer Matt Cutts:

“[T]he team didn’t think about the referrer aspect. So they stopped [the test]. They’ve paused it until they can find out how to keep the referrers.”27

THE PRIVACY IMPACT OF GOOGLE’S REDIRECTION TOOL

34. For more than ten years, Google has tracked the links that users click on from the search results page. During the search engine’s first few years, this was done by redirecting all clicks from the search engine results page through a script on Google’s servers, before then redirecting the users’ browser to the server hosting the content they actually wished to view.

35. Google has long disclosed this practice, starting in its first privacy policy from August 14, 2000:

“Google may choose to exhibit its search results in the form of a ‘URL redirecter.’ When Google uses a URL redirecter, if you click on a URL from a search result, information about the click is sent to Google, and Google in turn sends you to the site you clicked on. Google uses this URL information to understand and improve the quality of Google’s search technology. For instance, Google uses this information to determine how often users are satisfied with the first result of a query and how often they proceed to later results.”28

26 Matt McGee, “Google AJAX Search Results = Death To Search Term Tracking?,” Search Engine Land, February 3, 2009, available at: http://searchengineland.com/google‐ajax‐search‐results‐death‐to‐search‐ term‐tracking‐16431. 27 Lisa Barone, “Keynote Address – Matt Cutts, Google,” March 12, 2009, available at: http://outspokenmedia.com/internet‐marketing‐conferences/pubcon‐keynote‐matt‐cutts/. 28 Google Privacy Policy (archived), August 14, 2000, available at: http://www.google.com/privacy_archive_2000.html.

11 36. The specific technology used by Google to track clicks has varied over time and did not always rely on redirection. For example, starting in 2003, the service used “hidden” Javascript functions that were triggered when the user clicked their mouse on the search results page.29 Subsequent versions of Google’s privacy policy continued to disclose that the company was tracking the links that users clicked on, but did away with specific “redirecter service” language used in the company’s first privacy policy. For example, the company’s privacy policy in 2004 stated that:

“Google may present links in a format that enables us to understand whether they have been followed. We use this information to understand and improve the quality of Google’s search technology. For instance, this data helps us determine how often users are satisfied with the first result of a query and how often they proceed to later results.”30

37. In March 2009, Google again began to test the use of redirection based on some results pages. One unintentional side effect of this redirection script was that it caused the users’ search terms to be stripped from the referrer header later transmitted to web sites. This is because the URL of the redirection script did not contain the search terms.

An example of the format of the redirection script URL that was in use in March 2009 is:

http://www.google.com/url?q=http://www.webmd.com&ei=in66ScnjBt Kgtwfn0LTiDw&sa=X&oi=smap&resnum=1&ct=result&cd=1&usg=AFQjCN F9RdVC6vXBFOYvdia1s_ZE_BMu8g

38. In March 2009, Michael VanDeMar, a prominent member of the search engine optimization (SEO) community noticed that he was again seeing AJAX based search results in addition to redirected URLs for every link in the search results page:

“Occasionally you will see these Google redirects in the normal [search engine results pages] as well, although usually not. The thing is, I was

29“Click Tracking at Google (Hidden)”, forum thread at Webmaster World, November 20, 2003, available at: http://www.webmasterworld.com/forum3/18425.htm. “Google's Click Tracking script change to 3 parameters”, forum thread at Webmaster World, June 12, 2004, available at http://www.webmasterworld.com/forum3/24367.htm. 30 Google Privacy Policy (archived), July 1, 2004, available at: http://www.google.com/privacy_archive_2004.html

12 seeing them on every search I performed. It struck me as odd, until I suddenly realized that every search was being done via AJAX.”31

39. Google’s Matt Cutts soon responded to VanDeMar by leaving a comment on his blog:

“Hi Michael, I checked with some folks at Google about this. The redirection through a redirector was separate from any AJAX‐ enhanced search results; we do that for some experiments, but it’s not related to the JavaScript‐enhanced [AJAX] search results.

The solution to the referrer problem will be coming online in the future. It uses a JavaScript‐driven redirect that enables us to pass the redirect URL as the referrer. This URL will contain a ‘q’ param that matches the user’s query.”

40. On April 14, 2009, Google announced that it would be deploying the URL redirection tool for all links in the search results. The company described the details in a blog to the webmaster community:

“Starting this week, you may start seeing a new referring URL format for visitors coming from Google search result pages. Up to now, the usual referrer for clicks on search results for the term "flowers", for example, would be something like this:

http://www.google.com/search?hl=en&q=flowers&btnG=Google+Search

Now you will start seeing some referrer strings that look like this:

http://www.google.com/url?sa=t&source=web&ct=res&cd=7&url=http% 3A%2F%2Fwww.example.com%2Fmypage.htm&ei=0SjdSa‐ 1N5O8M_qW8dQN&rct=j&q=flowers&usg=AFQjCNHJXSUh7Vw7oubPaO 3tZOzz‐F‐u_w&sig2=X8uCFh6IoPtnwmvGMULQfw …. The new referrer URLs will initially only occur in a small percentage of searches. You should expect to see old and new forms of the URLs as this change gradually rolls out.”32

31 Michael VanDeMar, “Google Re‐initiates Testing of AJAX SERP’s With Faulty Proposed Fix,” Smackdown! (blog), March 13, 2009, available at: http://smackdown.blogsblogsblogs.com/2009/03/13/google‐re‐initiates‐testing‐of‐ajax‐serps‐with‐faulty‐ proposed‐fix/.

13

41. The redirection tool that Michael VanDeMar described in March 2009 did not include the search terms in its URL (and thus, these terms were not subsequently transmitted to webmasters via the browser’s referrer header). However, one month later when Google announced that it would be using the redirection tool for all links, the redirection script was changed to include the search terms in the redirection URL (via a new “q” parameter), thus guaranteeing that webmasters would not lose access to user search query data.

42. The new redirection tool also leaks data to web site administrators that had never before been available to anyone but Google: The item number of the search result that was clicked on (e.g. the 3rd link or 5th link from the search results page.)33 The leakage of this additional information was confirmed by Matt Cutts, which he described as a benefit to web site administrators:

“I think if you do experiments, you'll be able to confirm your speculation … I think this is awesome for webmasters‐‐even more information than you could glean from the previous referrer string. “34

43. A May 2009 video featuring Matt Cutts, posted to the official GoogleWebmasterHelp YouTube channel describes the change in the search query information leaked via the referrer header:

“[T]here is a change on the horizon and it's only a very small percentage of users right now, but I think that it probably will grow and it will grow over time where Google's referrer, that is whenever you do a Google search and you click on a result, you go to another and your browser passes along a value called a referrer. That referrer string will change a little bit.

It used to be google.com/search, for example.

Now, it will be google.com/url.

32 Brett Crosby, “An upcoming change to Google.com search referrals; unaffected,“ Google Analytics Blog, April 14, 2009, available at: http://analytics.blogspot.com/2009/04/upcoming‐ change‐to‐googlecom‐search.html 33 Patrick Altoft, “ Ranking Data to Referrer String,” Blogstorm, April 15, 2009, available at: http://www.blogstorm.co.uk/google‐adds‐ranking‐data‐to‐referrer‐string/ 34 Matt Cutts, Blog comment left at 15 Apr 2009 at 7:28 pm, available at: http://www.blogstorm.co.uk/google‐adds‐ranking‐data‐to‐referrer‐string/#IDComment77457344

14 And for a short time we didn't have what the query was which got a lot of people frustrated, but the google.com/search, the new Google referrer string will have the query embedded in it.

And there's a really interesting tidbit that not everybody knows, which is‐ ‐it also has embedded in that referrer string a pretty good idea of where on the page the click happened.

So, for example, if you were result number one, there's a parameter in there that indicates the click came from result number one. If you were number four, it will indicate the click came from, result number four. So, now, you don't necessarily need to go scraping Google to find out what your rankings were for these queries. You can find out, "Oh, yeah. I was number one for this query whenever someone clicked on it and came to my website."

So that can save you a ton of work, you don't need to worry nearly as much, you don't have to scrape Google, you don't have to think about ranking reports. Now, we don't promise that these will, you know, be a feature that we guarantee that we'll always have on Google forever but definitely take advantage of it for now. …. [F]or the most part, this gives you a very accurate idea of where on the page you were, so you get all kinds of extra information that you can use in your analytics and to compute your ROIs without having to do a lot of extra work. So, if you can, it's a good idea to look at that referrer string and start to take advantage of that information.”35

GOOGLE’S URL REDIRECTION TOOL INTENTIONALLY NEGATES THE SEARCH TERM REFERRER SCRUBBING CAUSED BY GOOGLE’S AJAX SEARCH PAGE

44. As of July 2010, it appears that Google has widely deployed both the AJAX based search results and the redirection tool. As such, even though the URL listed in the browser’s location bar for the search results page contains a # symbol, the search terms are still leaked to the web site that the user clicks on, via the browser’s referrer header.

35 Matt Cutts, “Can you talk about the change in Google's referrer string?,” GoogleWebMasterHelp Channel, May 6, 2009, available at: http://www.youtube.com/watch?v=4XoD4XyahVw

15 Thus, for example, if a user conducts a Google search for the terms “HIV testing,” the AJAX enabled results page will contain a # symbol before the search termsin the URL.

Figure 5: uses AJAX based technology to deliver search results. One side effect of this is that a # symbol is in the search results URL.

When the user clicks on the link in search results, their browser will first connect to Google’s redirection tool. Because the search results page URL has a # symbol before the query terms, the referrer header transmitted by the browser to Google’s redirection service does not include the users’ search terms. A snapshot of the header information for such a connection can be seen below:36

http://www.google.com/url?sa=t&source=web&cd=1&ved=0CB8QFjAA &url=http%3A%2F%2Fwww.hivtest.org%2F&rct=j&q=HIV%20testing&ei =kFM‐ TK6yHYX80wSjpJm2Aw&usg=AFQjCNHNhIBxfjO_1Vn_pln_XQs8HAyIFA

GET /url?sa=t&source=web&cd=1&ved=0CB8QFjAA&url=http%3A%2F%2Fww w.hivtest.org%2F&rct=j&q=HIV%20testing&ei=kFMTK6yHYX80wSjpJm2A w&usg=AFQjCNHNhIBxfjO_1Vn_pln_XQs8HAyIFA HTTP/1.1 Host: www.google.com

36 This information was collected with the Mozilla Firefox add‐on “Live Headers” (https://addons.mozilla.org/en‐US/firefox/addon/3829/) , and can be easily replicated by doing a Google search query while running the add‐on.

16 User‐Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en‐US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 […] Referer: http://www.google.com/

However, because Google also includes the user’s search terms in the URL of the redirection tool, when the user’s browser is subsequently redirected to the actual web site whose link they clicked on, the search terms are transmitted to the web site in the referrer header. An example of this can be seen below:

http://www.hivtest.org/

GET / HTTP/1.1 Host: www.hivtest.org User‐Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en‐US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 […] Referer: http://www.google.com/url?sa=t&source=web&cd=1&ved=0CB8QFjAA& url=http%3A%2F%2Fwww.hivtest.org%2F&rct=j&q=HIV%20testing&ei=k FMTK6yHYX80wSjpJm2Aw&usg=AFQjCNHNhIBxfjO_1Vn_pln_XQs8HAyIF

WHEN GOOGLE HAS ACCIDENTALLY SCRUBBED SEARCH QUERY INFORMATION FROM REFERRER HEADERS, IT HAS QUICKLY WORKED TO RESTORE THE LEAKAGE IN ORDER TO PROVIDE THE INFORMATION TO WEB SITE ADMINISTRATORS.

45. As documented several times already in this complaint, each time Google has accidentally stopped the leakage of search terms to web sites via browser referrer headers, it has quickly worked to restore the transmission of the data. For example, after the company initially piloted the AJAX based search page, the company issued a public statement which revealed that:

“It is not our intention to disrupt referrer tracking.”37

46. In or around July 2010, Google began stripping the search terms from the referrer headers transmitted by a small percentage of browsers. On July 13, 2010, individuals in the Search Engine Optimization (SEO) community noticed the change made by Google. One commentator in a web forum wrote that:

37 Matt McGee, “Google AJAX Search Results = Death To Search Term Tracking?,” Search Engine Land, February 3, 2009, available at: http://searchengineland.com/google‐ajax‐search‐results‐death‐to‐search‐ term‐tracking‐16431.

17

“More and more visits from Google in my server log are without exact referrer information, and have only ‘http://www.google.com’, ‘http://www.google.com.au’, etc. which doesn't allow to find out keyword and SERP [search engine results] page from which this visit was made.’38

47. On July 13 2010, Google’s Matt Cutts posted a message to the same SEO forum:

“Hey everybody, I asked folks who would know about this. It turns out there was an issue a couple weeks ago where some code got refactored, and the refactoring affected referrers for links opened in a new tab or window. Right now the team is expecting to have a fix out in the next week or so. Hope that helps.”39

48. This serves as one additional example of the fact that Google considers the real‐ time transmission of individual users’ search queries to web site administrators to be a high priority, and an intentional feature.

GOOGLE HAS ACKNOWLEDGED THE PRIVACY BENEFITS OF PROTECTING SEARCH TERMS FROM REFERRER HEADER LEAKAGE

49. On May 21, 2010, Google announced a new SSL encrypted search engine page, accessible at https://www.google.com.40 According to the company’s announcement, the service “helps protect your search terms and your search results pages from being intercepted by a third party on your network.”41 The service is not enabled by default, nor linked to from the search engine’s home page, and so users must navigate to the special URL in order to receive the privacy and security benefits of encrypted search.

38 “More and more referrals from Google are without exact referrer string,“ forum thread at Webmaster World, July 13, 2010, available at: http://www.webmasterworld.com/google/4168949.htm 39 “More and more referrals from Google are without exact referrer string,“ forum thread at Webmaster World, July 13, 2010, available at: http://www.webmasterworld.com/google/4168949.htm 40 This was later moved to https://encrypted.google.com. 41 Evan Roseman, “Search more securely with encrypted Google web search,” The Official Google Blog, May 21, 2010, available at: http://googleblog.blogspot.com/2010/05/search‐more‐securely‐with‐ encrypted.html

18 50. Web browsers, per technical standards, do not pass the referrer header when a user clicks on a HTTP (insecure) link from a HTTPS (secure) web site.42 Thus, if a user clicks on a link to the New York Times (http://www.nytimes.com) from the results page of a secure Google search, the New York Times web servers will not receive any information in the referrer header.

51. The official help page for the Google SSL search feature notes that use of the service also results in the scrubbing of the referrer header, and describes this as an additional privacy feature:

“As another layer of privacy, SSL search turns off a browser's referrers. Web browsers typically turn off referrers when going from HTTPS to HTTP mode to provide extra privacy. By clicking on a search result that takes you to an HTTP site, you could disable any customizations that the website provides based on the referrer information.”43

42 R. Fielding et al., “Hypertext Transfer Protocol ‐‐ HTTP/1.1”, IETF Request for Comments: 2616, June 1999, available at: http://www.w3.org/Protocols/rfc2616/rfc2616.html. (“Clients SHOULD NOT include a Referer [sic] header field in a (non‐secure) HTTP request if the referring page was transferred with a secure protocol.”). 43 “Features: SSL Search,” Google Web Search Help, 2010, available at: http://www.google.com/support/websearch/bin/answer.py?answer=173733&hl=en

19 GOOGLE HAS LONG PROTECTED GMAIL INBOX SEARCH QUERIES FROM LEAKING VIA REFERRER HEADERS

52. If a user of Google’s Gmail service searches their email inbox, the results page will have a URL similar to https://mail.google.com/mail/#search/HIV+test+results.

Figure 6: The URL for searches conducted within Google Mail include a # symbol before the search terms, thus protecting the terms from leakage via the browser's referrer header if the user clicks on a link in an email to a different web site.

53. Google takes proactive steps to shield inbox search query information from leaking via referrer headers to third parties, such as when a user clicks on a link in an email message. Google describes this practice as follows:

“Google also takes several steps to guard the confidentiality of users' information by offering a number of industry‐leading protections

Minimized ‘referrer’ header information. When you click on links in , the web browser that loads contains a referrer header. When you click on links in Gmail, Google takes steps to eliminate this referrer header, preventing others from knowing that you clicked on a link from an email.”44

44 “More on Gmail and privacy,” December 2009, available at: http://mail.google.com/mail/help/about_privacy.html

20 54. It is unclear why the company has decided that the terms searched for when looking through an email inbox are sensitive enough to not be shared with third parties, yet the terms queried using the company’s search engine are wholly undeserving of similar privacy protections, and can thus be freely shared with third parties.

GOOGLE ALREADY PROVIDES WEBMASTERS WITH PRIVACY‐PRESERVING AGGREGATE SEARCH QUERY INFORMATION

55. Search Engine industry analyst Danny Sullivan has written that “one of the most important online marketing tools is the referrer string…. When you understand the search terms someone used to reach your site, you understand how successful your search marketing activities are….. The referrer is what makes internet marketing so measurable, so performance‐driver and so unlike traditional marketing, where so little is measured.”45

56. In addition to providing real‐time search query data associated with an individual visit to web sites via the referrer header, Google has long provided web site owners with high‐level aggregate information on the search queries that take users to their web sites. This is a free service, available to any web site administrator. A screenshot of the Google Webmaster Center page can be seen below:

45 Danny Sullivan, “The Death Of Web Analytics? An Ode To The Threatened Referrer”, Search Engine Land, May 25, 2010, available at: http://searchengineland.com/the‐death‐of‐web‐analytics‐an‐ode‐to‐ the‐referrer‐42875.

21

Figure 7: The dashboard of Google's Webmaster Tools site, showing the search terms ("keywords") that lead users to a particular web site run by that webmaster.

57. It is certainly true that web site administrators can benefit from being able to learn the search queries that draw users to their web sites. However, they can get a lot of these benefits through high level aggregate statistics, without learning in real time which particular queries resulted in a visit an individual user.

58. If Google were to scrub the referrer header, web site administrators would still have access to these aggregate statistics, thus enabling them to measure the specific terms that draw visitors to their web sites.

22 GOOGLE’S SHARING OF SEARCH QUERY DATA IS NOT SIMPLY “HOW THE INTERNET WORKS”

59. Google may attempt to argue, as other companies have when confronted by their own leakage of personal data, that the transmission of search query data via the referrer header is “just how the Internet and browsers work."46 As the evidence compiled in this complaint clearly demonstrates, Google knowingly leaks search queries through the referrer header and has taken steps to restore the leakage of this data on numerous occasions after it accidentally stopped providing the information. Furthermore, the steps taken by Google to stop similar search query leakage from Gmail and its encrypted search service and recently by Facebook and MySpace47 demonstrate that Google could easily stop leaking search query information from its main search engine if it wished to do so.

GOOGLE’S INTENTIONAL LEAKAGE OF USERS’ SEARCH QUERY TERMS VIA THE BROWSER REFERRER HEADER IS A DECEPTIVE BUSINESS PRACTICE

60. According to the FTC Policy Statement on Deception, there are three elements to any deception case.48 First, there must be a representation, omission or practice that is likely to mislead the consumer. Second, the practice must be deceptive from the perspective of the average consumer. Third, the representation, omission, or practice must be a "material" one, and thus whether the act or practice is likely to affect the consumer's conduct or decision with regard to a product or service.

61. Google’s practice of allowing (and enabling) the leakage of private search query data to third parties is a deceptive business practice. The reasons for this are described below.

62. Google’s privacy policy misleads consumers, and assures its customers that it will only share their personally identifying information with third parties under a limited set of circumstances. This policy does not reflect the fact that the

46 Emily Steel and Jessica E. Vascellaro, “Facebook, MySpace Confront Privacy Loophole,” The Wall Street Journal, May 21, 2010, available at: http://online.wsj.com/article/SB10001424052748704513104575256701215465596.html. 47 Emily Steel and Jessica E. Vascellaro, “Facebook, MySpace Confront Privacy Loophole,” The Wall Street Journal, May 21, 2010, available at: http://online.wsj.com/article/SB10001424052748704513104575256701215465596.html. 48 FTC Policy Statement on Deception, October 14, 1983, available at: http://www.ftc.gov/bcp/policystmt/ad‐decept.htm

23 company knowingly and actively assists in the leakage of this data to third parties.

The Information Sharing section of Google’s Privacy Policy states that:

“Google only shares personal information with other companies or individuals outside of Google in the following limited circumstances:

 We have your consent. We require opt‐in consent for the sharing of any sensitive personal information.  We provide such information to our subsidiaries, affiliated companies or other trusted businesses or persons for the purpose of processing personal information on our behalf. We require that these parties agree to process such information based on our instructions and in compliance with this Privacy Policy and any other appropriate confidentiality and security measures.  We have a good faith belief that access, use, preservation or disclosure of such information is reasonably necessary to (a) satisfy any applicable law, regulation, legal process or enforceable governmental request, (b) enforce applicable Terms of Service, including investigation of potential violations thereof, (c) detect, prevent, or otherwise address fraud, security or technical issues, or (d) protect against harm to the rights, property or safety of Google, its users or the public as required or permitted by law.“49

63. Allowing users’ search terms to leak via the browser’s referrer header to every web site that the user clicks on from a Google search results page simply does not qualify as a “limited circumstance.”

64. Google may claim that it is not sharing any data with third parties, and thus has not violated its privacy policy. At a purely technical level, it is true that search queries are not directly transmitted by Google’s servers to the web servers of third parties. Instead, the company intentionally places users’ search terms into the search query URL, with the full knowledge that this will result in the transmission of search query information by users’ web browsers to third parties via the referrer header.

65. The distinction between Google delivering the data to third parties itself, and Google instructing users’ browsers to do so is essentially meaningless, particularly when most users have no easy method by which to scrub the referrer header themselves.

49 Google Privacy Policy, Marcy 11, 2009, http://www.google.com/privacypolicy.html.

24 66. Google may also attempt to argue that search queries are not personal information as it has narrowly defined the term in its privacy policy. However, company has previously acknowledged that search queries often contain personally identifiable information. For example, when Google fought the Department of Justice’s request in 2006 for search query data, company argued that:

“This is no minor fear because search query content can disclose identities and personally identifiable information such as user‐initiated searches for their own social security or credit card numbers, or their mistakenly pasted but revealing text.”50

Similarly, in a declaration submitted to the court describing the kinds of personal information that can end up in the company’s search query logs, Google engineer Matt Cutts stated:

“There are ways in which a search query alone may reveal personally identifying information. For example, many internet users have experienced the mistake of trying to copy‐and‐paste text into the search query box, only to find that they have pasted something that they did not intended. Because Google allows very long queries, it is possible that a user may paste a fragment of an email or a document that would tie the query to a specific person. Users could also enter information such as a credit card, a social security number, an unlisted phone number or some other information that can only be tied to one person. Some people search for their credit card or social security number deliberately in order to check for identity theft or to see if any of their personal information is findable on the Web.”51

67. The Court eventually agreed with Google:

“Basic identifiable information may be found in the text strings when users search for personal information such as their social security numbers or credit card numbers through Google in order to determine whether such information is available on the Internet. The Court is also aware of so‐called ‘vanity searches,’ where a user queries his or her own name perhaps with other information. Google's capacity to handle long

50 Memorandum in Opposition to the Government's Motion to Compel filed by Google Inc, in Gonzales v. Google, 234 F.R.D. 674 (N.D. Cal. 2006) http://docs.justia.com/cases/federal/district‐ courts/california/candce/5:2006mc80006/175448/12/ 51 Declaration of Matt Cutts, February 17, 2006, in Gonzales v. Google, 234 F.R.D. 674 (N.D. Cal. 2006) at page 9, available at: http://docfiles.justia.com/cases/federal/district‐ courts/california/candce/5:2006mc80006/175448/14/0.pdf

25 complex search strings may prompt users to engage in such searches on Google. Thus, while a user's search query reading ‘[username] stanford glee club’ may not raise serious privacy concerns, a user's search for ‘[user name] third trimester abortion san jose,’ may raise certain privacy issues as of yet unaddressed by the parties' papers. This concern, combined with the prevalence of Internet searches for sexually explicit material ‐‐ generally not information that anyone wishes to reveal publicly ‐‐gives this Court pause as to whether the search queries themselves may constitute potentially sensitive information.”52

68. Google’s privacy policy makes express claims regarding the limited scenarios in which the company will share sensitive user data. As such, per the FTC Policy Statement on Deception, materiality of this claim is presumed.

69. Even if search queries are not considered personal information, Google has made deceptive statements about the situations in which it will share such non‐ personal information. The company’s current privacy policy states that:

We may share with third parties certain pieces of aggregated, non‐ personal information, such as the number of users who searched for a particular term, for example, or how many users clicked on a particular advertisement. Such information does not identify you individually.”53

The company states that the information the company will more generally share with third parties is aggregate level information. However, the information obtained by web sites via the browser referrer header includes specific, individual queries as well as the IP address of the user who conducted the search.

70. As described in the FTC Policy Statement on Deception, a material representation, omission or practice is one that is likely to affect a consumer's choice of or conduct regarding a product.

52 ORDER by Judge James Ware granting in part and denying in part 1 Motion to Compel Compliance with Subpoena. In Gonzales v. Google, 234 F.R.D. 674 (N.D. Cal. 2006) , http://docs.justia.com/cases/federal/district‐courts/california/candce/5:2006mc80006/175448/31/ 53 Google Privacy Policy, Marcy 11, 2009, http://www.google.com/privacypolicy.html. Interestingly enough, the company has deleted this sentence from its new privacy policy, which takes effect on October 3, 2010. See: Google Privacy Policy (Preview), October 3, 2010, available at: http://www.google.com/privacypolicy_2010.html.

26 71. Google’s misrepresentation of the extent to which it shares search query data is material, because if consumers knew that their search queries are being widely shared with third parties, they would be less likely to use Google’s services. Google itself argued this very point in Gonzales v. Google:

“Google users trust that when they enter a search query into a Google search box, not only will they receive back the most relevant results, but that Google will keep private whatever information users communicate absent a compelling reason. …. The privacy and anonymity of the service are major factors in the attraction of users – that is, users trust Google to do right by their personal information and to provide them with the best search results. If users believe that the text of their search queries into Google's search engine may become public knowledge, it only logically follows that they will be less likely to use the service.”54

Similarly, Google engineer Matt Cutts wrote in a declaration to the court that:

“Google does not publicly disclose the searches (sic) queries entered into its search engine. If users believe that the text of their search queries could become public knowledge, they may be less likely to use the search engine for fear of disclosure of their sensitive or private searches for information or .”55

REQUEST FOR RELIEF

72. I request that the Commission investigate Google, enjoin its deceptive business practices, and require Google to protect the privacy of Google’s customers. Specifically, I request that the Commission:

a. Compel Google to take proactive steps to protect the privacy of individual users’ search terms, such that they are not leaked to third parties through browser referrer headers and other, similar channels.

b. Compel Google to place a prominent notice on its search engine home page, describing the company’s long‐standing practice of proactively

54 Memorandum in Opposition to the Government's Motion to Compel filed by Google Inc, in Gonzales v. Google, 234 F.R.D. 674 (N.D. Cal. 2006) http://docs.justia.com/cases/federal/district‐ courts/california/candce/5:2006mc80006/175448/12/ 55 Declaration of Matt Cutts, February 17, 2006, in Gonzales v. Google, 234 F.R.D. 674 (N.D. Cal. 2006), available at: http://docfiles.justia.com/cases/federal/district‐ courts/california/candce/5:2006mc80006/175448/14/0.pdf

27 sharing users’ sensitive search queries with third parties, and provide a link to a web page detailing the steps taken by the company to protect the information in the future.

c. Compel Google to notify, via electronic mail each logged‐in Google user whose search query data has been intentionally leaked to third party sites via their browsers’ referrer header.

d. Compel Google to take reasonable steps to protect any other sensitive information from leaking via the browser referrer header from any of its other web based services.

e. Compel Google to add an easy to use feature to its Chrome browser that can be enabled with a single mouse click, and to proactively protect referrer header information when the user is in “private browsing mode”.

f. Compel Google to refrain from explicitly or implicitly misrepresenting the extent to which it protects or discloses any information maintained about consumers in the future, including search engine queries, IP address information and other data transmitted via browser headers, the methods by which it “anonymizes” retained log data, and the limitations of such techniques.

g. Compel Google to obtain an annual assessment and report from a qualified, objective, independent third‐party professional, using procedures and standards generally accepted in the profession, within one hundred and eighty (180) days after service of the Commission’s order, and annually thereafter for twenty (20) years after service of the Commission’s order, that:

i. Set forth the specific administrative, technical, and physical safeguards that Google has implemented and maintained during the reporting period to limit data retention and protect the privacy of consumer data.

ii. Explain how such safeguards are appropriate to Google’s size and complexity, the nature and scope of Google’s activities, and the sensitivity of the personal information collected from or about consumers.

iii. Explain how the safeguards that have been implemented meet or exceed the protections required by other parts of the Commission’s order.

28 iv. Certify that Google’s security program is operating with sufficient effectiveness to provide reasonable assurance that the security, confidentiality, and integrity of personal information is protected and, for annual reports, has so operated throughout the reporting period.

h. Take any and all action the Commission deems appropriate pursuant to the Safe Harbor agreement between the United States and European Union; and

i. Compel any other relief the Commission deems appropriate.

I reserve the right to supplement this petition as other information relevant to this proceeding becomes available.

Respectfully submitted,

Christopher Soghoian PO Box 2266 Washington, DC 20013 Telephone: 617‐308‐6368 Email: [email protected]

September 6, 2010

29