An Empirical Study of the Use of Integrity Verification Mechanisms for Web Subresources Bertil Chapuis, Olamide Omolola, Mauro Cherubini, Mathias Humbert, Kévin Huguenin

To cite this version:

Bertil Chapuis, Olamide Omolola, Mauro Cherubini, Mathias Humbert, Kévin Huguenin. An Empiri- cal Study of the Use of Integrity Verification Mechanisms for Web Subresources. The Web Conference (WWW), Apr 2020, Taipei, Taiwan. pp.34-45, ￿10.1145/3366423.3380092￿. ￿hal-02435688￿

HAL Id: hal-02435688 ://hal.archives-ouvertes.fr/hal-02435688 Submitted on 20 Jan 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. An Empirical Study of the Use of Integrity Verification Mechanisms for Web Subresources Bertil Chapuis Olamide Omolola Mauro Cherubini UNIL – HEC Lausanne TU Graz UNIL – HEC Lausanne Switzerland Austria Switzerland [email protected] [email protected] [email protected] Mathias Humbert Kévin Huguenin armasuisse S+T UNIL – HEC Lausanne Switzerland Switzerland [email protected] [email protected] ABSTRACT 1 INTRODUCTION Web developers can (and do) include subresources such as scripts, The Web is a set of interlinked resources identied by their URLs. stylesheets and images in their webpages. Such subresources might A signicant portion of these resources consists of HTML web- be stored on content delivery networks (CDNs). This practice cre- pages that include navigable links and subresources, such as scripts, ates security and privacy risks, should a subresource be corrupted. stylesheets, images or videos. A change in a subresource can aect The subresource integrity (SRI) recommendation, released in mid- the webpage that includes it. 2016 by the W3C, enables developers to include digests in their With the advent of content delivery networks (CDNs), an increas- webpages in order for web browsers to verify the integrity of sub- ing number of subresources are hosted by third-party providers resources before loading them. In this paper, we conduct the rst (in this paper, we will refer to these as external subresources, also large-scale longitudinal study of the use of SRI on the Web by ana- called cross-domain subresources). The advantages provided by lyzing massive crawls (≈3B URLs) of the Web over the last 3.5 years. such platforms include reduced costs and latency as well as in- Our results show that the adoption of SRI is modest (≈3.40%), but creased reliability. However, their usage comes at a security price: grows at an increasing rate and is highly inuenced by the practices A subresource can be altered (accidentally or not) upon transmis- of popular library developers (e.g., Bootstrap) and CDN operators sion from these third-party providers or directly on them, as it was (e.g., jsDelivr). We complement our analysis about SRI with a survey the case for the British Airways in 2018 [14]. The consequences of web developers (푁 =227): It shows that a substantial proportion can be dramatic, including the theft of user credentials (i.e., on of developers know SRI and understand its basic functioning, but login pages) and credit card data (i.e., on payment pages), malware most of them ignore important aspects of the recommendation. injection, and website defacement (i.e., modication of the content). The results of the survey also show that the integration of SRI by In general, when an external subresource is included in a webpage, developers is mostly manual – hence not scalable and error prone. there is no guarantee that its content will remain the same. This calls for a better integration of SRI in build tools. The subresource integrity (SRI) recommendation [46], released in mid-2016 by the W3C, addresses this issue by enabling web CCS CONCEPTS developers/webmasters to include digests in order for web browsers to verify the integrity of subresources before loading them. SRI is • Security and privacy → Web protocol security; Hash functions implemented in the vast majority of desktop and mobile browsers. and message authentication codes. Unfortunately, no in-depth analysis about the adoption and the KEYWORDS understanding of SRI (by web developers) has been performed so web security; subresource integrity; common crawl far. Most existing works [25, 39] focus on modest-size datasets of webpages, focus on one snapshot only, and only look at the ACM Reference Format: basic statistics, e.g., they do not study the main factors behind the Bertil Chapuis, Olamide Omolola, Mauro Cherubini, Mathias Humbert, adoption/usage of SRI. Our work lls this gap (i) by conducting the and Kévin Huguenin. 2020. An Empirical Study of the Use of Integrity Verication Mechanisms for Web Subresources. In Proceedings of The Web rst large-scale longitudinal study on the adoption/usage of SRI Conference 2020 (WWW ’20), April 20–24, 2020, Taipei, Taiwan. ACM, New on the Web, and (ii) by surveying web developers regarding their York, NY, USA, 13 pages. https://doi.org/10.1145/3366423.3380092 understanding and usage of SRI. Contributions. By relying on a massive dataset of about 3B URLs, we rst thoroughly analyze the use of the SRI recommendation on This paper is published under the Creative Commons Attribution 4.0 International the Web over the last 3.5 years. We began our analysis in May 2016, (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their right before the ocial release of the SRI recommendation and took personal and corporate Web sites with the appropriate attribution. WWW ’20, April 20–24, 2020, Taipei, Taiwan a snapshot approximately every six months. Our analysis of the © 2020 IW3C2 (International World Wide Web Conference Committee), published 3B webpages shows the following: rst, we measure the extent of under Creative Commons CC-BY 4.0 License. the most typical threat, i.e., the use of external subresources, and ACM ISBN 978-1-4503-7023-3/20/04. https://doi.org/10.1145/3366423.3380092 we nd that more than 80% of the webpages include at least one WWW ’20, April 20–24, 2020, Taipei, Taiwan Chapuis et al. external subresource; second, we study the adoption and usage of " ! ! " SRI over time and observe an increasing, but still modest usage, http(s)://www.server.com/ http(s)://www.otherserver.com/ + scripts/ - style.css from less than 1% in October 2017 to 3.40% in September 2019. We - script.js - image.png - index. observe that the use of SRI is linked to a number of popular libraries (1) Fetch webpage Webpage and CDN operators, including Bootstrap and jsDelivr, that provide sanne (Switzerland) is currently researching web development prac- No knowledge [skip to demographic section] tices. By lling this survey, you will contribute to the research and Little knowledge (I heard the term) can enter a ra e draw to win a $100 Amazon voucher. This sur- Some knowledge (I know the basics) vey takes approximately 5-10 minutes to complete. The answers Moderate knowledge (I used SRI) you provide will be treated anonymously and will not be linked to Extensive knowledge (I use it often) your identity. The data we collect will be used solely for academic research (non-prot). If you have questions about this research, A.3 Knowledge of SRI feel free to e-mail us at [email protected]. The survey will run (1) Please describe in your own words how you can use subre- until 15th September 2019, and the questionnaire is best viewed source integrity (SRI) on a website as well as the purpose of with Mozilla Firefox and Google Chrome. SRI. [free text] Note: (2) In your opinion, is it meaningful to use subresource integrity (1) The survey is meant for web developers that are 18 years (SRI) if the target is served through HTTPS? old or older. Example: (2) Participants younger than 18 years old or who do not disclose Source: https://www.website.com their age group will not be able to participate in the ra e Target: https://cdn.com/js/page.js draw. [terminate] Yes

No, I am not comfortable taking the questionnaire in Eng- No lish [terminate] I am not sure (2) Are you an active web developer? (5) Please explain why you answered Yes/No to this question. Yes, I do web development on my free time [subquestion appears only if “Yes”/“No” is selected in previ- Yes, I work as a web developer ous question] No, I am not directly involved with web development (6) In your opinion, what happens when an integrity attribute [terminate] has 2 or more valid hash values (i.e., digests) created with dierent algorithms? A.2 Awareness of SRI Example: Introductory text. that are stored in a server separate from the server where the main The user-agent loads the resource only if the digest created site is hosted. with the strongest hashing algorithm matches that of the (1) Please describe possible security threats caused by exter- resource nal resources included in a website, given the hypothetical The user-agent loads the resource only if the rst digest situation described above. [free text] in the list matches that of the resource (2) If you described security threats in the previous question, The user-agent loads the resource only if any digest in the please explain possible ways in which the website in the hy- list matches that of the resource pothetical scenario could be protected against these threats. The user-agent does not load the resource at all [free text] The user-agent loads the resource in any case (3) Please rate how much you know about the subresource in- tegrity (SRI) recommendation of the W3C An Empirical Study of the Use of Integrity Verification Mechanisms for Web Subresources WWW ’20, April 20–24, 2020, Taipei, Taiwan

I am not sure 18 - 24 Other (please specify) 25 - 34 (7) In your opinion, what happens when an integrity attribute 35 - 44 has 2 or more valid digests generated with the same 45 - 54 algorithm? 55 - 64 Example: 65+ status? The user-agent loads the resource only if the rst digest Employed full-time in the list matches that of the resource Employed part-time The user-agent loads the resource only if any digest in the Independent contractor, freelancer, or self-employed list matches that of the resource Not employed, but looking for work The user-agent does not load the resource at all Not employed, and not looking for work The user-agent loads the resource in any case Retired I am not sure Other (please specify): Other (please specify) (3) What is the size of your company where you work (or own)? (8) In your opinion, what happens if the digest in an integrity [only showed if respondent is working] attribute is malformed (i.e., generated with an unsupported Individually owned company algorithm, composed of non-base64 characters, etc.)? Exam- Startup ple: Small and medium-sized enterprises (SME) Other (please specify) The user-agent does not load the resource at all (4) Do you have an IT security background? The user-agent loads the resource in any case Yes, I have a diploma in IT security I am not sure Yes, I studied IT security in a college/university without Other (please specify) earning a degree (9) How do you typically include Subresource Integrity (SRI) in Yes, I took some courses in IT security but did not spe- development? (Select all that apply) cialise in it

 Copy-paste snippets from the ocial documentation Yes, I have a university degree or equivalent in IT security

 Copy-paste snippets from online communities Yes, I had some training sponsored by the company where  Compute the checksums of the subresources and include I work them myself No, I do not have any IT security background

 Congure my build tool to compute and include the check- Other (please specify) sums automatically  I am not sure A.5 Communication  Other (please specify) (1) Do you wish to participate in the ra e draw for a $100 (10) Do you think the SRI recommendation should be extended Amazon gift card? to subresources other than the stylesheets and the Yes scripts