Abusive Advertising: Scrutinizing Socially Relevant Algorithms in a Black Box Analysis to Examine Their Impact on Vulnerable Patient Groups in the Health Sector
Total Page:16
File Type:pdf, Size:1020Kb
Abusive Advertising: Scrutinizing socially relevant algorithms in a Black Box analysis to examine their impact on vulnerable patient groups in the health sector Master Thesis by Martin Reber March 2, 2020 arXiv:2101.02018v1 [cs.CY] 4 Jan 2021 Technische Universit¨atKaiserslautern, Department of Computer Science, 67653 Kaiserslautern, Germany Examiner: Prof. Dr. Katharina Zweig Tobias Krafft Eigenst¨andigkeitserkl¨arung Hiermit versichere ich, dass ich die von mir vorgelegte Arbeit mit dem Thema Abusive Advertising: Scrutinizing socially relevant algorithms in a Black Box analysis to examine their impact on vulnerable patient groups in the health sectorselbstst¨andigverfasst habe, dass ich die verwendeten Quellen und Hilf- smittel vollst¨andigangegeben habe und dass ich die Stellen der Arbeit - ein- schließlich Tabellen und Abbildungen -, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Kaiserslautern, den 2.3.2020 Martin Reber ii Abstract The targeted direct-to-customer marketing of unap- proved stem cell treatments by a questionable online industry is directed at vulnerable users who search the Internet in the hope of a cure. This behavior espe- cially poses a threat to individuals who find themselves in hopeless and desperate phases in their lives. They might show low reluctance to try therapies that solely promise a cure but are not scientifically proven to do so. In the worst case, they suffer serious side-effects. Therefore, this thesis examines the display of advertise- ments of unapproved stem cell treatments for Parkin- son's Disease, Multiple Sclerosis, Diabetes on Google's results page. The company announced a policy change in September 2019 that was meant to prohibit and ban the practices in question. However, there was evidence that those ads were still being delivered. A browser extension for Firefox and Chrome was devel- oped and distributed to conduct a crowdsourced Black Box analysis. It was delivered to volunteers and vir- tual machines in Australia, Canada, the USA and the UK. Data on search results, advertisements and top stories was collected and analyzed. The results showed that there still is questionable advertising even though Google announced to purge it from its platform. iii Zusammenfassung Die Direktvermarktung von nicht zugelassenen Stammzellbehandlungen von der fragwurdigen¨ Online- Industrie dahinter zielt auf Patienten, die das Internet in der Hoffnung auf Heilung durchsuchen. Dieses Verhalten stellt eine besondere Gefahr fur¨ Menschen dar, die sich in verzweifelten Phasen in ihrem Leben befinden. Sie k¨onnten wenig Zuruckhaltung¨ zeigen, die beworbenen Therapien auszuprobieren, die zwar eine Heilung versprechen, diese aber nicht durch anerkann- te klinische Tests belegen k¨onnen. Im schlimmsten Fall erwarten die Patienten schwerwiegende Neben- wirkungen. Daher untersucht diese Theis die oben genannten Werbeanzeigen auf der Ergebnisseite der Online- Suchmaschine Google nach einer Anderung¨ der Plat- formrichtlinien im September 2019. Besonders ging es dabei um Anzeigen bezuglich¨ Behandlugen von Parkin- son, Multipler Sklerose und Diabetes, die in den Ver- haltensregeln ausdrucklich¨ verboten wurden. Browsererweiterungen fur¨ Firefox und Chrome wur- den entwickelt und verteilt, um damit eine crowdsour- ced Black Box Analyse durchzufuhren.¨ Freiwillige Teil- nehmer und virtuelle Maschinen in Australien, Kana- da, den USA und Großbritannien wurden rekrutiert. Es wurden Daten zu Suchergebnissen, Werbung und Schlagzeilen auf der Ergebnisseite von Google gesam- melt. Die Analyse derer ergab, dass es trotz des explizi- ten Verbots dieser Praktiken noch immer fragwurdige¨ Werbung gab. Contents List of Figures vii 1. Introduction1 1.1. Motivation . .2 1.2. Outline . .4 2. Fundamentals5 2.1. Information . .5 2.2. Models . .6 2.3. Socially Relevant Algorithms . .7 2.3.1. Algorithms . .7 2.3.2. Social Relevance . .9 2.4. Communication . 11 2.4.1. Shannon's technical Model of Communication . 12 2.4.2. Watzlawick's psychological Communication Model . 13 2.4.3. Context-oriented Communication Model . 14 2.5. Socio-technical Systems . 18 2.5.1. Technical System . 19 2.5.2. Social System . 20 2.5.3. Kienle and Kunau's Socio-Technical System . 21 2.6. Data Economy and advertising . 22 2.6.1. Web-tracking and data collection . 22 2.6.2. Tracking protection . 25 2.6.3. Targeted advertising . 25 2.7. Integrated Search Engines: Google as an advertisement enabler 26 2.7.1. Integrated Search Engines . 27 2.7.2. Web Advertisement . 29 2.7.3. Business Models . 30 2.7.4. The Integrated Search Engine's role as an intermediary 34 2.8. Application to Web Search . 37 3. Related Work 41 3.1. Digitalized Health in the Realm of Stem-Cell-Tourism . 41 3.2. Governance . 46 3.2.1. The need for control . 46 3.2.2. Proposals of governance . 48 3.2.3. Challenges . 52 3.3. Algorithm Accountability . 53 3.3.1. Transparency . 54 3.3.2. Responsibility . 55 v Contents 3.4. Black Box Analysis . 60 3.4.1. Methodology . 62 3.4.2. Challenges . 67 4. EuroStemCell Data Donation 2019 / 2020 (EDD) 69 4.1. The Donation Plugin . 70 4.1.1. Requirements . 70 4.1.2. Conceptual Design . 70 4.1.3. Development . 72 4.1.4. Deployment . 75 4.2. Findings . 76 4.2.1. Data Analysis . 76 4.2.2. Limitations . 88 4.3. Lessons learned . 89 4.3.1. Study Design . 89 4.3.2. Technical . 92 5. Conclusion 97 6. Future Work 99 Bibliography 103 A. EuroStemCell Data Donation: Development 133 A.1. My Code . 133 A.2. User Story . 133 A.3. Product Backlog . 134 A.4. Participant survey . 136 A.5. Query composition and crawled HTML elements . 138 A.5.1. Query composition . 138 A.5.2. Crawled HTML elements . 139 B. EuroStemCell Data Donation: Data Analysis and Visualizations 141 B.1. Downloads . 141 B.2. Participants . 143 B.3. Advertisements and Advertisers . 143 C. Functionality of a Search Engine 145 vi List of Figures 1.1. Sample SERP before policy change I . .3 1.2. Sample SERP before policy change II . .3 2.1. Shannon's Communication Model . 12 2.2. Luhmann's triple selection . 15 2.3. Kienle & Kunau's Communication Model . 16 2.4. Kienle's sender activities . 17 2.5. Organic vs. sponsored search . 29 2.6. Yuan et al.'s online ad ecosystem . 31 2.7. Muthukrishan's ad paths . 32 2.8. WSS communication processes . 38 2.9. STS of web search and advertising . 39 3.1. Lessig's four forces . 50 3.2. Zweig's Chain of Responsibility . 58 3.3. Black Box Chain of Responsibility . 63 3.4. Krafft’s Black Box Concept . 64 4.1. Plugin-Server-Communication . 72 4.2. EDD Registration . 74 4.3. EDD collection process . 75 4.4. Overview all studies . 76 4.5. Overview of PD studies . 77 4.6. Donations over time . 78 4.7. Daily donation distribution . 78 4.8. Donation events . 79 4.9. Donations by individual participant and cumulative submissions 79 4.10. Histogram of donor's donation distributions . 80 4.11. Advertiser histogram . 80 4.12. Top 20 advertisers . 83 4.13. Critical actors in host categories . 84 4.14. Overview of donation statistics . 85 4.15. Proportion of critical ads . 86 4.16. Critical ads by group . 87 4.17. Critical ads by keyword . 87 4.18. Typical examples for swissmedica advertisement creatives . 88 4.19. Keyword suggestions . 90 4.20. Krafft’s Black Box Concept . 92 4.21. Security incidents . 94 A.1. On-bording Process . 136 vii List of Figures A.2. Description of crawled elements . 140 B.1. Firefox plugin statistics . 141 B.2. Daily chrome users . 142 B.3. Cumulative Chrome registrations . 142 B.4. Donations by group . 143 B.5. Absolute Number of advertisements per group . 143 B.6. Fraction of Prescription Treatment Advertisements per group . 144 viii 1. Introduction Digitalization has changed mankind in many ways. One technology that has grown to be indispensable is the search engine. They serve as an entry point to the WWW separating the websites most relevant to a user from noise. Most search engines provide their service to Internet users at no monetary cost. They finance their operation through advertisements (or short: ads) displayed along with search results on the search engine's result page (SERP). They call this sponsored or affiliated search. These integrated search engines (ISEs) combine search utility with the capabilities of an advertising exchange and thus connect advertisers, content providers and searchers. However, most ISEs are privately operated Internet platforms. They rose to be powerful intermediaries that take the role of algorithmic gatekeepers. Not only do they control the flow of communication between users and content providers. On their platform, they also organize ad distribution and direct attention. Concurrently, they get to retain transaction data of all involved participants, e.g. user data, website content, ad efficacy, conversion costs of businesses. In some domains, misconfiguration of algorithms has only minor conse- quences, like irrelevant search results or dysfunctional technical components. When it comes to medicine and health though, digitalization is probably go- ing to have the \most immediate and profound personal and social conse- quences" (Petersen, Tanner, et al., 2019, p.368). ISEs and their advertising partners combine various personal data to explore \the most intimate aspects of our selves" (Petersen, Tanner, et al., 2019, p.368). In the realm of health, they can have immediate effect on the well-being of citizens and their sur- roundings. Although society is heavily affected by privately-operated Internet-based platforms it cannot assess the functionality and safety of those software sys- tems. A Guardian journalist describes this situation as \operating on blind, ignorant, misplaced trust" (Goldacre, 2014) and adds that choices in algorithm design are generally being made without.