Health Policy 123 (2019) 338–341

Contents lists available at ScienceDirect

Health Policy

j ournal homepage: www.elsevier.com/locate/healthpol

Google Trends: Opportunities and limitations in health and health

policy research

a,∗ b c

Vishal S. Arora , Martin McKee , David Stuckler

a

Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA, USA

b

London School of Hygiene & Tropical Medicine, ECOHOST, 15-17 Tavistock Place, London, WC1H 9SH, United Kingdom

c

Università Bocconi, Via Sarfatti, 25, Milano, Italy

a r t i c l e i n f o a b s t r a c t

Article history: Web search engines have become pervasive in recent years, obtaining information easily on a variety

Received 1 April 2015

of topics, from customer services and goods to practical information. Beyond these search interests,

Received in revised form 3 January 2019

however, there is growing interest in obtaining health advice or information online. As a result, health

Accepted 6 January 2019

and health policy researchers are starting to take note of potential data sources for surveillance and

TM

research, such as Trends , a publicly available repository of information on real-time user search

Keywords: TM

patterns. While research using Google Trends is growing, use of the dataset still remains limited. This

Health policy

paper offers an overview of the use of such data in a variety of contexts, while providing information on

Social media

Internet its strengths, limitations, and recommendations for further improvement.

© 2019 Published by Elsevier B.V.

Health behaviours

1. Introduction a health-related topic online, ranging from mental health, immu-

nizations to sexual health information [4].

Google dominates the market for internet search engines and is Health and health policy researchers are also starting to take

so pervasive that the term “to Google” has entered everyday use note of the potential of these data. A PubMed search for “Google

TM

in a way that none of its competitors has. In Europe it is used in Trends” or “Google Insights” (previous version of Google Trends )

85% of internet searches [1] while in the US, it accounts for 65% revealed an over 20-fold increase in original research articles or

TM

[2]. In 2012, Google handled approximately 1.2 trillion searches research letters using Google Trends from 2009 to 2018 (Fig. 1).

globally, or 3.3 billion searches per day [3]. Although more recent While much of this information is kept secret by online providers,

data are difficult to obtain because of commercial confidentiality, other elements are available to anyone. This paper offers a general

it has been estimated that the total number rose to about 2 tril- guide to how Google collects and shares search engine data, how

lion in 2018. In an era where web searches and transactions are their data has been used in the past, and how health and health

recorded instantaneously, this activity generates a massive volume policy researchers can make greater use of these data in the future.

of data whose uses are often unexpected and virtually limitless.

Every search that is undertaken, and every page that is viewed, is

TM

tracked. Consequently, Google itself, along with many online con- 2. Google Trends : what it is and how it works

tent providers, such as Amazon, make extensive use of these data,

TM

tailoring advertisements and the results of searches to each user’s Google Trends is the principle tool used to study trends

browsing history. and patterns of search engine queries using Google. It is one of

While individuals search for many things online, such as con- a suite of Google tools that track different types of activity, such

TM

sumer goods and services, or practical information (such as opening as , which records citations of papers, and Google

TM

hours or travel timetables), there is also significant search activity Analytics , which allows the owner of a website to track where

related to health concerns. According to the Pew Internet and Amer- and when people are viewing that site. As in any epidemiolog-

ican Life Project, 80% of Internet users in the US have searched for ical study, it is important to take account of the denominator

when interpreting counts. Google Trends does this by expressing

the absolute number of searches relative to the total number of

searches in each location and at each time. The number of searches

for each term (e.g. “alcoholism”) relative to total searches is referred

Corresponding author.

E-mail address: vishal [email protected] (V.S. Arora). to as the query share. This query share is then normalised to the

https://doi.org/10.1016/j.healthpol.2019.01.001

0168-8510/© 2019 Published by Elsevier B.V.

V.S. Arora et al. / Health Policy 123 (2019) 338–341 339

2008 financial crisis. In contrast, the highest intensity in the USA

was back in 2004.

TM

Google Trends offers a high level of geographical precision

in developed countries, allowing for searches to be stratified at a

national, regional, and city level. Trends in searches for different

terms can be compared and multiple search terms can be combined,

with a “+” sign, to identify those searching for terms in combination

(e.g. “alcoholism + treatment”).

Lastly, Google uses natural language processing methods and

indexes web pages collected to classify search queries into one of 25

specific categories (including health) and over 300 sub-categories.

TM

Unlike other public health datasets, what makes Google Trends

data so novel is that it is collected and reported in real time and is

publicly available.

Fig. 1. Research using Google Trends/ Insights, 2009 to 2018.

Source: PubMed search for records with “Google Trends” or “Google Insights” in title

or abstract. TM

3. Applications of Google Trends in population health

research

highest volume of searches for that term over the time period being

studied. This index ranges from 0 to 100, with 100 recorded on Initially, one of the best known applications of

the date that saw the highest relative search volume activity for engine query data in population health research was early detec-

TM

that term. Thus, a Google Trends search index of 25 indicates tion of influenza epidemics. In 2009, collaborators at Google and

that search activity for a particular term was 25% of that seen at the US Centers for Disease Control found that the relative frequency

the time when search activity was most intense. Fig. 2 illustrates of searches for influenza-like illness correlated well with the per-

this, and also how patterns can vary across countries. Thus, using centage of physician visits for influenza in the United States, with

searches for the term “antidepressants” (and equivalent in relevant a 1-day reporting lag (traditional CDC estimates have a 1–2 week

languages) over the period 2004 to 2018, the highest intensities in reporting lag) [5]. Since then, however, the uses of Google search

three of the countries, Australia, France, and the United Kingdom, engine data in population health research have grown substan-

have been in 2017/18, with levels around 20 until the onset of the tially. Many examples involve early detection of other infectious

Fig. 2. Search activity for anti-depressants in France, Australia, United Kingdom, and USA January 2004 to December 2018.

Note: Search terms – France, antidepressants + antidépresseur; UK/ Australia – antidepressants; USA - antidepressants + antidepresivo (Spanish).

TM

Source: Google Trends searched 31st December 2018.

340 V.S. Arora et al. / Health Policy 123 (2019) 338–341

Box 1: Tracking interest in electronic cigarettes: Evi- Box 3: Uncovering racism.

dence from Australia, Canada, United Kingdom, and the Social desirability bias exists where people answer questions

United States. in ways that they expect will be viewed favourably by others.

In the past few years electronic cigarettes (e-cigarettes) have This makes research on issues such as racism difficult. One

been marketed intensively [16]. Although the clinical evidence study measured the proportion of Google searches for the “N-

that they aid quitting is weak, social media abounds with word” in 196 locations covering the USA [26]. They found a

claims that they are effective in this respect. To determine significant association with mortality among African Ameri-

how consumer interest in these products has evolved, Ayers cans, even after adjusting for white mortality rates. As the

TM

et al. [36] examined Google Trends search activity for e- authors noted, these findings are consistent with other liter-

cigarettes against traditional smoking cessation products (e.g. ature on the adverse associations between racism and health.

nicotine replacement therapy) from July 2008 to February 2010.

Their results showed that search activity for e-cigarettes rapidly

surpassed that of any of the traditional smoking cessation

products. For example, search volume for e-cigarettes was The ability to combine search terms unlocks the potential for

® ®

some especially imaginative approaches. For example, White et al.

300% and 160% higher than Chantix or Champix (vareni-

cline) in the US and UK, respectively. Furthermore, search [27] demonstrated the ability to identify interactions between

activity for e-cigarettes within the US was significantly higher drugs from searches for their names in combination, something

for those states that had stronger tobacco control measures, that would easily be missed using routine post-marketing surveil-

as determined by the American Lung Association. lance.

TM

4. Where we go from here with Google Trends

Box 2: Association of abortion-related searches with TM

While Google Trends has been able to provide valuable

abortion policies and availability: A global perspective.

insights into population health surveillance and behaviours, the

There are few, if any, medical procedures that provoke as much

intense political debate as aborting a foetus. Moreover, as abor- data are subject to certain caveats. Among these limitations, the

tions remain illegal or heavily restricted in many jurisdictions, most obvious one is that all of the search data available through

TM

leading women to obtain them illegally or in other countries, Google Trends is anonymised and reflects those with internet

accurate data on abortion provision is often unreliable and

access, potentially excluding vulnerable groups (e.g. elderly) or

outdated. Because information identifying those undertaking

regions where internet uptake could be low (e.g. some parts of

internet searches is not in the public domain, data on volume

low- and middle-income countries). Researchers will not be able

of searches may provide a proxy for interest in obtaining an

to know who is searching for health terms and what their inten-

abortion. Based on this assumption, Reis and Brownstein [17]

TM tions might be. While Google does use natural language processing

explored the relationships between Google Trends search

methods to code health-related searches, this is not available for all

activity, local abortion rates, and local abortion policies. Both

countries and languages. Third, it is still not quite clear what search

in the US and internationally, the study found search volume

for abortions was inversely related to local abortion rates. terms one should use when exploring a particular health behaviour

However, search volume was also significantly higher in those (e.g. depression, alcoholism, etc.). Fourth, studies should be based

regions of the US where barriers to an abortion were more on a clear conceptual model of behaviour, where appropriate, in

stringent (e.g. mandatory parental notification for minors).

which the searches can plausibly be linked to ideas and ultimately

behaviour. This is not always the case [28]. Finally, it is important to

be aware of the risk of reporting bias, with only those studies find-

outbreaks, such as Lyme disease [6], a selection of tropical diseases ing positive correlations being published, as has been suggested

in India [7], syphilis [8], HIV [9], and Zika virus infections [10]. recently [29].

Over time, Google search engine data has increasingly been uti- Research using search engine activity is still in its infancy and

lized to understand health behaviours. For example, it has been there are some things that population health researchers and

shown that changes in the volume of suicide-related searches may Google might do to maximise the value of this method.

provide an early warning of changing mental health risk [11–13], The first relates to consistency of reporting. As Nuti et al. [30]

TM

although the association is strongest for suicides among younger note in their systematic review on Google Trends , there is no

people and middle-aged women, both groups more likely to take defined consensus on how to document Google search engine

overdoses (and thus require information on how to do it) than queries in academic papers. For example, only 19% of the publi-

among older men, among whom hanging is more common [14]. A cations they identified had defined the search category used (e.g.

related study examined the commonly held view that media cov- health) and only 39% of papers provided an explicit search strategy.

TM

erage of celebrity suicides can either increase or decrease suicidal As a minimum, research utilising Google Trends data should doc-

ideation, finding limited evidence for both, but only with the most ument the exact search terms inputted, the translations used when

prominent celebrities [15]. searching in other than English, category used, a downloadable

TM

Some other novel uses of Google Trends have included mon- spreadsheet of extracted search indices, and the date the analysis

itoring interest in electronic cigarettes [16], abortions [17], and was performed.

bariatric surgery [18], assessing the relationship between health- The second relates to how health-related searches are collected

related searches and economic conditions (e.g. unemployment and categorised, especially if a search term might be misconstrued

rates) [19–21], and tracking pharmaceutical utilisation and rev- as non-health related (e.g. “smoking” when juxtaposed with “chim-

enues [22] (Boxes 1 and 2 ). Several studies have also examined ney” or “gun”). This is an area where there is considerable scope for

seasonality of events not easily identifiable from existing data dialogue between population health researchers and Google, tak-

sources [23,24]. A related application was seen in a study that quan- ing advantage of advances in artificial intelligence. Without any

tified the number of days during that increased interest in smoking transparency on how algorithms for search terms are calculated, it

cessation lasted after a tax rise [25]. Information on what people will be difficult for researchers to know just how accurately search

search has even provided insights on societal attitudes, such as activity can model trends in search activity within a population

racism, which is less easily discernible in survey data (Box 3) [26]. [31].

V.S. Arora et al. / Health Policy 123 (2019) 338–341 341

The third relates to ethical issues. Unlike research that uses post- [11] McCarthy MJ. Internet monitoring of suicide risk in the population. Journal of

TM Affective Disorders 2010;122:277–9.

ings on social media [32], Google Trends data are anonymised.

[12] Gunn JF, Lester D. Using google searches on the internet to monitor suicidal

However, it is conceivable that it may be possible to identify an

behavior. Journal of Affective Disorders 2013;148:411–2.

individual living in a particular location who has a very rare dis- [13] Ayers JW, Althouse BM, Allem J-P, Rosenquist JN, Ford DE. Seasonality in seeking

mental health information on Google. American Journal of Preventive Medicine

ease, finding which other search terms were used in combination

2013;44:520–5.

with that disease. Given advances in artificial intelligence, it will be

[14] Arora VS, Stuckler D, McKee M. Tracking search engine queries for suicide in

important to monitor the situation for unintended consequences, the United Kingdom, 2004–2013. Public Health 2016;137:147–53.

[15] Gunn JF, Goldstein SE, Lester D. The impact of widely publicized suicides on

such as those that have emerged with social media [33].

search trends: using Google Trends to test the Werther and Papageno effects.

Fourth, there is considerable scope for methodological develop-

Archives of Suicide Research 2018:1–24, 0.

ment, drawing on new approaches to the analysis of search engine [16] McKee M. E-cigarettes and the marketing push that surprised everyone. BMJ

data from other fields. Thus, research on the propagation of memes, 2013;347, f5780–f5780.

[17] Reis BY, Brownstein JS. Measuring the impact of health policies using Internet

or ideas (e.g. marketing imagery, such as that employed by the

search patterns: the case of abortion. BMC Public Health 2010;10:514.

manufacturers of products that may impact of health) has used

[18] Rahiri J-L, et al. Using Google Trends to explore the New Zealand public’s inter-

concepts from infectious disease modelling [34]. One recent sys- est in bariatric surgery. ANZ Journal of Surgery 2018;88:1274–8.

[19] Althouse BM, Allem J-P, Childers MA, Dredze M, Ayers JW. Population health

tematic review has identified forecasting as an area in particular

concerns during the United States’ great recession. American Journal of Pre-

need of development [35].

ventive Medicine 2014;46:166–70.

TM

Data from Google Trends and other search engine repositories [20] Tefft N. Insights on unemployment, unemployment insurance, and mental

health. Journal of Health Economics 2011;30:258–64.

will never replace traditional data collection methods for popula-

[21] Frijters P, Johnston DW, Lordan G, Shields MA. Exploring the relationship

tion health. However, with further refinements and constructive

between macroeconomic conditions and problem drinking as captured by

dialogue between researchers and Google, search engine data can Google searches in the U.S. Social Science & Medicine 2013;1982(84):61–8.

[22] Schuster Nathaniel M, Rogers Mary AM, McMahon Jr Laurence F. Using search

offer a powerful, real-time tool to assess how population health and

engine query data to track pharmaceutical utilization: a study of statins. Amer-

health behaviours are changing within society.

ican Journal of Managed Care 2010;16.

[23] Association of Search Engine Queries for Chest Pain With Coronary Heart

Disease Epidemiology, Available at: | Cardiology | JAMA cardiology | JAMA

Key

network; 2019 (Accessed: 3rd January 2019) https://jamanetwork-com.ezp- prod1.hul.harvard.edu/journals/jamacardiology/fullarticle/2706608.

• TM

Google Trends is the principle tool used to study trends [24] Seasonal and geographic patterns in seeking cardiovascular health informa-

tion: an analysis of the online search trends – ScienceDirect; 2019. Avail-

and patterns of search engine queries—including health-related

able at: https://www-sciencedirect-com.ezp-prod1.hul.harvard.edu/science/

queries—using Google.

• article/pii/S0025619618305779?via%3Dihub (Accessed: 3rd January 2019).

Search engine data in public health has diverse applications in [25] Tabuchi T, Fukui K, Gallus S. Tobacco price increases and population interest in

the literature, from tracking influenza outbreaks to monitoring smoking cessation in Japan between 2004 and 2016: a Google Trends analysis.

Nicotine & Tobacco Research 2018, http://dx.doi.org/10.1093/ntr/nty020.

interest in e-cigarettes.

[26] Chae DH, et al. Association between an internet-based measure of area racism

• TM

Google Trends still has significant limitations that need

and black mortality. PLoS One 2015;10:e0122963.

to be addressed through dialogue between population health [27] White RW, Tatonetti NP, Shah NH, Altman RB, Horvitz E. Web-scale phar-

macovigilance: listening to signals from the crowd. Journal of the American

researchers and Google, particularly regarding how search engine

Medical Informatics Association 2013;20:404–8.

queries are collected, organised, and coded.

[28] Low validity of Google Trends for behavioral forecasting of national suicide

rates; 2019. Available at: https://journals.plos.org/plosone/article?id=10.1371/

References journal.pone.0183149 (Accessed: 3rd January 2019).

[29] Cervellin G, Comelli I, Lippi G. Is Google Trends a reliable tool for digital epi-

demiology? Insights from different clinical settings. Journal of Epidemiology

[1] Scott M. Qwant wants to be alternative to Google. Bits Blog; 2014.

and Global Health 2017;7:185–9.

[2] Search engine market share in the United States. Statistic. In: Statista;

[30] Nuti SV, et al. The use of Google Trends in health care research: a systematic

2018. Available at: https://www.statista.com/statistics/267161/market-share-

review. PLoS One 2014;9:e109583.

of-search-engines-in-the-united-states/ (Accessed: 3rd January 2019).

[31] Butler D. When Google got flu wrong. 2013;494:155–6.

[3] Google Zeitgeist 2012: a year in search; 2019. Available at: http://www.google.

[32] McKee R. Ethical issues in using social media for health and health care research.

co.uk/zeitgeist/2012/#the-world (Accessed: 8th February 2015).

Health Policy 2013;110:298–301.

[4] Internet & Technology - Pew Research Center.

[33] McKee M, Stuckler D. How the internet risks widening health inequalities.

[5] Ginsberg J, et al. Detecting influenza epidemics using search engine query data.

American Journal of Public Health 2018;108:1178–9.

Nature 2009;457:1012–4.

[34] Wang L, Wood BC. An epidemiological approach to model the viral propagation

[6] Kapitány-Fövény M, et al. Can Google Trends data improve forecasting of Lyme

of memes. Applied Mathematical Modelling 2011;35:5442–7.

disease incidence? Zoonoses Public Health 2018, 0.

[35] Mavragani A, Ochoa G, Tsagarakis KP. Assessing the methods, tools, and sta-

[7] Verma M, et al. Google search trends predicting disease outbreaks: an analysis

tistical approaches in Google Trends research: systematic review. Journal of

from India. Healthcare Informatics Research 2018;24:300–8.

Medical Internet Research 2018;20:e270.

[8] Young SD, Torrone EA, Urata J, Aral SO. Using search engine data as a tool to

[36] Ayers JW, Ribisl KM, Brownstein JS. Tracking the rise in popularity of electronic

predict syphilis. Epidemiology 2018;29:574–8.

nicotine delivery systems (electronic cigarettes) using search query surveil-

[9] Young SD, Zhang Q. Using search engine for predicting new HIV diag-

lance. American Journal of Preventive Medicine 2011;40:448–53.

noses. PLoS One 2018;13:e0199527.

[10] Morsy S, et al. Prediction of Zika-confirmed cases in Brazil and Colombia using

Google Trends. Epidemiology & Infection 2018;146:1625–7.