You Are How (And Where) You Search? Comparative Analysis of Web Search Behaviour Using Web Tracking Data

You Are How (And Where) You Search? Comparative Analysis of Web Search Behaviour Using Web Tracking Data

You Are How (and Where) You Search? Comparative Analysis of Web Search Behaviour Using Web Tracking Data Aleksandra Urman,1 2 Mykola Makhortykh, 1 1 University of Bern 2 University of Zurich urman@ifi.uzh.ch, [email protected] Abstract small user samples and can hardly be generalized to broader populations.On the contrary, log-based studies capture the We conduct a comparative analysis of desktop web search behaviour of users from Germany (n=558) and Switzerland behaviour of the large groups of users, but on the aggregate (n=563) based on a combination of web tracking and survey level, thus limiting possibilities for inferring the impact of data. We find that web search accounts for 13% of all desk- users’ individual characteristics on how they search for in- top browsing, with the share being higher in Switzerland than formation. Additionally, log-based studies can not reliably in Germany. We find that in over 50% of cases users clicked infer the connection between search results ranking and user on the first search result, with over 97% of all clicks being behaviour, because researchers can not, in retrospect, iden- made on the first page of search outputs. Most users rely on tify how search results were ranked and presented to individ- Google when conducting searches, and users’ preferences for ual users due to the temporal changes in the results and ef- other engines are related to their demographics. We also test fects of search personalization (Hannak et al. 2013; Kliman- relationships between user demographics and daily number Silver et al. 2015) and randomization (Makhortykh, Urman, of searches, average share of search activities in one’s gen- eral browsing behaviour as well as the tendency to click on and Ulloa 2020; Urman, Makhortykh, and Ulloa 2021). higher- or lower-ranked results. We find differences in such In the present study, we address these limitations by rely- relationships between the two countries that highlights the ing on a type of data source that, to the best of our knowl- importance of comparative research in this domain. Further, edge, has not been used in the context of web search be- we observe differences in the temporal patterns of web search haviour. We utilize the combination of web tracking data use between women and men, marking the necessity of dis- (Christner et al. 2021) with demographic data about individ- aggregating data by gender in observational studies regarding ual users acquired via survey to explore users’ web search online information behaviour. behaviour. Web tracking data includes information on user desktop-based browsing behaviour along with the actual Introduction HTMLs of the browsed content. By acquiring HTMLs of Web search engines are ubiquitous nowadays and act as pages viewed by the users, we can infer the exact composi- major information gate-keepers in high-choice media envi- tion and ranking of web search results users were exposed ronments. Google alone handled around 6.9 billion queries to and, consequently, find out which of these results they per day in 2020 (Petrov 2019) with an average user of clicked on. Google.com turning to the site 18.15 times per day as of Using the combination of web tracking and survey data April 2021 (Alexa 2021). When Google experienced an collected in Germany and Switzerland in spring 2020, we outage in 2013 for 5 minutes, there was a drop of 40% aim to address several gaps in the existing scholarship on in the global web traffic (Svetlik 2013). The numbers are web search behaviour. First, we scrutinize the effect of indi- arXiv:2105.04961v1 [cs.HC] 11 May 2021 staggering, especially given that Google is just one of the vidual demographic characteristics on search behaviour us- search engines - though the dominant one on most mar- ing a large sample of users. Second, unlike earlier large-scale kets. Furthermore, search engines are highly trusted by (i.e., log-based) search behaviour studies, which were fo- their users:according to Edelman Trust Barometer (Edelman cused on single-country populations (usually, the US), our 2021), in 2020 search engines were reported to be the most study offers a comparative perspective and goes beyond the trusted information source globally. US context. Third, we examine the user clicking behavior Given the importance of search engines for shaping pub- in relation to web search results ranking in real-life condi- lic opinion, it is crucial to understand users’ web search be- tions - in contrast to eye-tracking studies that typically rely haviours. Yet, our knowledge in this context remains limited on smaller samples and are carried out in lab settings. and primarily relies on two types of data. eye-tracking (Pan Specifically, we address the following research questions: et al. 2007; Schultheiß, Sunkler,¨ and Lewandowski 2018) 1) how frequently do users with different demographic char- and search engine transaction log data (Jansen and Spink acteristics and socio-economic status use search engines?; 2006; Weber and Jaimes 2011). Both of these data sources 2) what are the temporal patterns of web search use and have their limitations: eye-tracking studies typically rely on do they differ by demographics? 3) are there demographic or socio-economic status-based differences in the choice of in short queries and rarely navigate beyond page 1 of the specific search engines (i.e., Google/Bing/other)?; 4) how search engine. Similar findings were reported by authors of does the rank of a search result relate to the clicking be- a 1-week-long study based on a Korean search engine Naver haviour of users with different demographics? We also ex- (Park, Ho Lee, and Jin Bae 2005). amine country-level differences in relation to each of the Log data has also been utilized to examine temporal as- four questions. pects of web search (Zhang, Jansen, and Spink 2009) and the patters of search query usage (Weber and Jaimes 2011). Related work Such studies allow inferring real-life web search usage pat- Studies on web search behaviour to date have relied on either terns and are based on large data samples - as contrasted of the two data source types: eye-tracking and search engine to eye-tracking-based lab studies. However, log-based stud- transaction log data. ies also have several limitations. First, due to the difficulty Eye-tracking-based studies are typically conducted on of obtaining search logs data owned by proprietary compa- smaller samples, usually not demographically representative nies, most of the transaction logs-based studies focus on sin- ones, and within lab settings. The advantage of such stud- gle search engines. It underminesthe generalizability of their ies is that they allow examining user attention patterns in findings since usage patterns ocan be affected by the dif- the context of web search and, for instance, exploring the ferences in search engine interfaces and/or the differences relation between the ranking of search results and users’ in the demographics of their users. Even studies such as clicking behaviours. In one of the earliest studies (Granka, (Jansen and Spink 2006) that analyze log data from multiple Joachims, and Gay 2004), the authors have examined atten- search engines can not match the users across these engines, tion and clicking patterns of web search engine users based which prevents them from examining ifthe same users uti- on a student sample (n=36), and found that top-ranked re- lize multiple different engines and, if so, whether and how sults receive disproportionately more attention and clicks their behavior is different depending on the engine. than lower-ranked ones. This finding was corroborated in The absence of reliable demographic data about the users numerous further studies (e.g., (Pan et al. 2007; Schultheiß, is another limitation of logs-based studies. Such data is Sunkler,¨ and Lewandowski 2018; Joachims et al. 2007; sometimes available on users’ gender and age, but not on Guan and Cutrell 2007)). other variablessuch as education or income level that can Eye-tracking studies have also investigated the impact of only be inferred by the researchers (e.g., (Weber and Jaimes additional factors on search result selection. For instance, 2011)). However, even such inference-based studies are rare two studies used a small (n=18) sample of users of di- Third, log-based data is inherently noisy, because search re- verse ages and occupations (Joachims et al. 2007) and a quests might be executed not only by human users but also student sample (n=22) (Pan et al. 2007) from the US found by bots, and it is difficult to differentiate between organic that clicking decisions are influenced not only by ranking and automated requests (Jiang, Pei, and Li 2013). Finally, but also perceived relevance of search results. A replication transaction log data, does not allow tracing the position of of the latter study (Schultheiß, Sunkler,¨ and Lewandowski the search results a user clicked on and the only ranking- 2018) conducted circa 10 years after the original one on a related parameter availableis the number of the search result student sample (n=28) in Germany has found similar effects page on which a user selected a result. thus indicating the stability of observed effects across time The aforementioned limitations of both approaches can and different national contexts. be addressed by utilizing web tracking data that includes Despite providing important insights in user search be- full HTMLs of the pages browsed by users and is combined haviour, eye-tracking studies are subjected to a number with survey data. Unlike eye-tracking, this approach is scal- of limitations, in particular their limited scalability. While able and allows observing user behavior in real-life circum- some potential solutions for scaling are being offered in re- stances, not in a lab setting.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us