Blocklist Babel: on the Transparency and Dynamics of Open Source Blocklisting

Total Page:16

File Type:pdf, Size:1020Kb

Blocklist Babel: on the Transparency and Dynamics of Open Source Blocklisting 1 Blocklist Babel: On the Transparency and Dynamics of Open Source Blocklisting Alvaro´ Feal, Pelayo Vallina, Julien Gamba, Sergio Pastrana, Antonio Nappa, Oliver Hohlfeld, Narseo Vallina-Rodriguez, and Juan Tapiador Abstract—Blocklists constitute a widely-used Internet security are protecting an estimated 3,126,410,000 user mailboxes [1]. mechanism to filter undesired network traffic based on IP/domain Similarly, MISP (Malware Intelligence Sharing Platform [2]), reputation and behavior. Many blocklists are distributed in open a popular open source threat intelligence platform, reports that source form by threat intelligence providers who aggregate and process input from their own sensors, but also from third- it is used by more than 6,000 organizations worldwide. Beyond party feeds or providers. Despite their wide adoption, many blocklists, other feeds can be used by applications to block or open-source blocklist providers lack clear documentation about restrict access. One example are feeds of Tor exit node IPs, their structure, curation process, contents, dynamics, and inter- e.g., used by Wikipedia to block users of the Tor anonymity relationships with other providers. In this paper, we perform network from editing Wikipedia due to abuses in the past [3]. a transparency and content analysis of 2,093 free and open source blocklists with the aim of exploring those questions. To Blocklists can be proprietary (i.e., commercial) or open that end, we perform a longitudinal 6-month crawling campaign source. The former are often available through rate-limited, yielding more than 13.5M unique records. This allows us to shed license-based, or pay-per-use mechanisms and are maintained light on their nature, dynamics, inter-provider relationships, and by for-profit companies specialized in threat intelligence. On transparency. Specifically, we discuss how the lack of consensus the other hand, we define open source blocklists as lists that on distribution formats, blocklist labeling taxonomy, content focus, and temporal dynamics creates a complex ecosystem that are openly and freely available for anyone to collect and use. complicates their combined crawling, aggregation and use. We The ecosystem of open source blocklists is heterogeneous, also provide observations regarding their generally low overlap as with different types of providers, from for-profit companies well as acute differences in terms of liveness (i.e., how frequently that make publicly available—often partially—some of their records get indexed and removed from the list) and the lack of data feeds, to individual users or organizations that publish documentation about their data collection processes, nature and intended purpose. We conclude the paper with recommendations their own collected and curated data. One key aspect of this in terms of transparency, accountability, and standardization. diversity is that different providers follow different methods to collect, curate, maintain, and label their blocklists. Yet Index Terms—Security Management, Functional Areas; Net- work Monitoring and measurements, Methods; Internet Connec- openness does not necessarily imply transparency. Often times, tivity and Internet Access Services, Service Management. their internal data collection and sanitization processes are not sufficiently documented, thus impeding their optimal usage and, ultimately, their widespread adoption. We elaborate on I. INTRODUCTION these issues below: Network management practices often rely on data feeds • Data sources: While some providers operate their own assigning labels to network entities for access control, e.g., sensors and honeypots to detect malicious activities, others feeds provided by threat intelligence platforms to identify and might aggregate records from other providers’ blocklists. block content regarded as malicious, untrustworthy, or simply Yet, many blocklist providers do not document their data bad reputed. Such feeds are commonly regarded as blocklists sources. This impedes blocklist consumers from assessing (or blacklists) and can contain a variety of actionable records, the limitations, representativity, applicability, and scope of including IP addresses, hostnames, URLs, TLS certificate a given feed. Understanding the nature of the records con- fingerprints, or file hashes. The effectiveness of blocklist- tained in a blocklist and the potential overlap between dif- ing to raise the bar against unwanted behavior, traffic, and ferent providers is instrumental for an optimal exploitation, content has driven a steady growth in the threat intelligence particularly for end-users combining multiple feeds into a sharing ecosystem, with a variety of products and platforms single, consolidated database. When open source blocklist streamlining the aggregation, enforcement, and distribution of providers fail to document their data sources, users might actionable data. Even if the market penetration of blocklisting end up aggregating lists compiled for different purposes or is hard to know, Spamhaus–one of the oldest threat intelligence that are indeed indexing the same entries, possibly under providers–states that as of October 6, 2020 their blocklists different tags or labels. • Data curation: Blocklist providers should make available a A.´ Feal, P. Vallina, J. Gamba and N. Vallina-Rodriguez are with IMDEA Networks Institute. clear description of the methodology followed to guarantee A.´ Feal, P. Vallina, J. Gamba, S. Pastrana, A. Nappa and J. Tapiador are that the data they provide is sound and does not include false with Universidad Carlos III de Madrid. positives or outdated records. For instance, a given IP that A. Nappa is with U.C. Berkeley O. Hohlfeld is with Brandenburg University of Technology. was part of a botnet in the past can be benign later down the N. Vallina-Rodriguez is with ICSI. road. Similarly, when new malicious indicators appear, they 2 should be included in well-maintained blocklists as soon as categorization processes can lead to problems to users that try possible. For those blocklists feeding from each other, record to improve their protection by aggregating lists from different propagation across lists might negatively contribute to the providers. This is an under-researched area that deserves a dissemination of false positives, making it harder to remove closer examination given the widespread use of open source records once they are detected as such.1 As a result, the blocklists in operational environments. update frequency of lists, including the addition or removal To address the challenges discussed above, in this paper we of records, is an important characteristic to measure their run an extensive campaign of identification and crawling of liveness and data curation processes. 2,093 open source blocklists from 69 providers (Section III). • Record and blocklist labeling: Open source blocklists Our dataset includes feeds from prominent providers (e.g., are widely heterogeneous in terms of the type of records MISP or Spamhaus) and blocklists used for different cases and contained in their feeds and the potentially harmful behavior applications: from identifying users from anonymity networks that they try to protect against (i.e., tracking, phishing, or to blocking malicious content. Our large-scale and diverse malware). Yet, many providers publish their feeds without dataset allows us to gain a unique view to investigate fun- sufficiently describing their actual purpose or application. damental questions related to the nature, dynamics, pitfalls, While some platforms rely on public taxonomies to label transparency and purpose of the open-source blocklist ecosys- records in their list, many blocklists simply provide a custom tem. Our paper contributions are as follows: label, if any. Even for those that try to label the nature • We inspect the ecosystem of open source blocklists pro- of their data and their data sources, there are substan- viders, analyzing their characteristics and how transparent tial methodological differences (and subjective perceptions) they are (Section IV). We also study the differences across across providers when defining their records. For instance, blocklists in terms of the type of records that they include labeling differences often come down to a granularity issue, (Section V). with some lists using generic labels such as “malicious,” • We analyze the dynamics of the blocklists in our dataset by while others offer a specialized classification system for studying their changes (i.e., record additions and removals) malicious types. These differences in terms of labeling and update rates over time (Section VI). We observe that methodology and strategy are problematic because (i) they 30% of the lists never change and that 7% change at result in instances where a given entry appears in blocklists least daily. In the case of those blocklists that are updated, that have different purposes according to their providers; 12% only add entries, 32% only sporadically remove them and (ii) complicate their comparison and aggregation into (possibly due to sanitization efforts), and 56% both add and more comprehensive threat intelligence feeds. remove records. • We study the relationships between blocklists in terms of The aforementioned reasons call for an empirical analysis content overlap (Section VII) and how records propagate and measurement of the soundness, freshness, dissemination, among multiple blocklists over time (Section VIII). These and nature of the records present in open source blocklists,
Recommended publications
  • ITU Botnet Mitigation Toolkit Background Information
    ITU Botnet Mitigation Toolkit Background Information ICT Applications and Cybersecurity Division Policies and Strategies Department ITU Telecommunication Development Sector January 2008 Acknowledgements Botnets (also called zombie armies or drone armies) are networks of compromised computers infected with viruses or malware to turn them into “zombies” or “robots” – computers that can be controlled without the owners’ knowledge. Criminals can use the collective computing power and connected bandwidth of these externally-controlled networks for malicious purposes and criminal activities, including, inter alia, generation of spam e-mails, launching of Distributed Denial of Service (DDoS) attacks, alteration or destruction of data, and identity theft. The threat from botnets is growing fast. The latest (2007) generation of botnets such as the Storm Worm uses particularly aggressive techniques such as fast-flux networks and striking back with DDoS attacks against security vendors trying to mitigate them. An underground economy has now sprung up around botnets, yielding significant revenues for authors of computer viruses, botnet controllers and criminals who commission this illegal activity by renting botnets. In response to this growing threat, ITU is developing a Botnet Mitigation Toolkit to assist in mitigating the problem of botnets. This document provides background information on the toolkit. The toolkit, developed by Mr. Suresh Ramasubramanian, draws on existing resources, identifies relevant local and international stakeholders, and
    [Show full text]
  • M AAWG: Hosting Past, Present & Future
    3 M AAWG: Hosting Past, Present & Future Justin Lane - BlueHost and M3AAWG Hosting Co-Chair Jesse Sowell, PhD and M3AAWG Advisor 5 May 2016 LACNIC 25, La Habana, Cuba www.m3aawg.org © 2015 Messaging, Malware and Mobile Anti-Abuse Working Group Hosting: Not a Big Issue, Right? For many years the Community looked at Hosting Companies like this picture. We where a small area that did not look to be that important. www.m3aawg.org © 2015 Messaging, Malware and Mobile Anti-Abuse Working Group A Brief History of Hosting Hosting Companies in the beginning where a small part of the overall environment. Most of the Hosting Companies at the time offered plans that gave their clients 20-50MB of space to use for their websites. ESP’s and ISP’s where much bigger players. www.m3aawg.org © 2015 Messaging, Malware and Mobile Anti-Abuse Working Group What Hosting Offers Hosting Companies were able to offer all the services needed to get a company online and ready to service their customers. • Email Services • CMS, and Webdesign • Hardware from Dedicated Servers or Colocation Servers to Shared or Virtual Server Space • Bandwidth for your Business, Dedicated Ips and SSL Services • Access to Hardware that most smaller businesses were not able to afford on their own. www.m3aawg.org © 2015 Messaging, Malware and Mobile Anti-Abuse Working Group What is Happening Today? As the community got better at policing ISP’s, ESP’s and other problem areas, the criminal elements began to migrate to the Hosting Platforms. With Hosting Companies these criminal elements were even better positioned to carry out their plans.
    [Show full text]
  • IFIP AICT 394, Pp
    A Scalable Spam Filtering Architecture Nuno Ferreira1, Gracinda Carvalho1, and Paulo Rogério Pereira2 1 Universidade Aberta, Portugal 2 INESC-ID, Instituto Superior Técnico, Technical University of Lisbon, Portugal [email protected], [email protected], [email protected] Abstract. The proposed spam filtering architecture for MTA1 servers is a component based architecture that allows distributed processing and centralized knowledge. This architecture allows heterogeneous systems to coexist and benefit from a centralized knowledge source and filtering rules. MTA servers in the infrastructure contribute to a common knowledge, allowing for a more rational resource usage. The architecture is fully scalable, ranging from all-in- one system with minimal components instances, to multiple components instances distributed across multiple systems. Filtering rules can be implemented as independent modules that can be added, removed or modified without impact on MTA servers operation. A proof-of-concept solution was developed. Most of spam is filtered due to a grey-listing effect from the architecture itself. Using simple filters as Domain Name System black and white lists, and Sender Policy Framework validation, it is possible to guarantee a spam filtering effective, efficient and virtually without false positives. Keywords: spam filtering, distributed architecture, component based, centralized knowledge, heterogeneous system, scalable deployment, dynamic rules, modular implementation. 1 Introduction Internet mail spam2 is a problem for most organizations and individuals. Receiving spam on mobile devices, and on other connected appliances, is yet a bigger problem, as these platforms are not the most appropriate for spam filtering. Spam can be seen as belonging to one of two major categories: Fraud and Commercial.
    [Show full text]
  • 9 Things Every Marketer Must Know About Email
    THINGS EVERY MARKETER MUST KNOW ABOUT EMAIL with Greg Kraios & Chris Arrendale Inbox Pros TABLE OF CONTENTS CHAPTERS 1 Deliverability Branding 3 2 Shared IP vs. Dedicated IP 7 3 Feedback Loops 13 4 No-Reply's 18 5 Setting Subscriber Expectations 21 6 Single Opt-In vs. Double Opt-In 29 7 Permission Marketing 37 8 Email Authentication 44 9 Email Throttling 50 250ok.com | InboxPros.com 2 Your domain is your identity. Write that down on a Post-It Note and stick it to your monitor. Now repeat after us, “my domain is my identity.” DELIVERABILITY It’s the identity that your customers, the recipients of your marketing and transactional messages, have come to trust, or believe they can. It’s this trust that you must BRANDING: YOUR guard as if your business depends on it because it often DOMAIN IS YOUR does. How so? In the 2015http://www.pardot.com/blog/the-2015-email-marketing-landscape/ State of Marketing Report, the Salesforce survey of 5,000 marketers revealed 20% of IDENTITY the audience reported their business primary revenue source is directly linked to email operations. Those people saying email is dead? Tell them to put that stat in their pipe and smoke it (in locations where that’s legal, of course). So, in an eort to get your identity in order, we are going to focus on the Display Name or Friendly From, From Address, and the links used within the email. EMAIL, STILL NOT DEAD 3 1 | DELIVERABILITY BRANDING THE INBOX EXPERIENCE The Friendly From, subject line, and preview text are the In a perfect world, an intuitive and consistent first connections your message has with a subscriber.
    [Show full text]
  • Contents Software Description:
    Swift Email Processor v2.0 www.webemailverifier.com Support: www.webemailverifier.com/supportsuite/ Contents Software Description: ................................................................................................................................... 3 Program GUI Screenshots: ............................................................................................................................ 4 Program Features: ........................................................................................................................................ 9 Chapter 1: Installation and Settings Configuration ..................................................................................... 11 Configuration and Settings: .................................................................................................................... 12 Chapter 2: Using the Proxy Servers Module ............................................................................................... 31 Chapter 2.1: X-Originating-IP email header & Proxy Acceptable Usage Policy .......................................... 35 Chapter 3: Sending Messages using the Sender Module ........................................................................... 36 3.1: Unsubscribe link placements ........................................................................................................... 49 3.2: Installing and using the MySQL/MariaDB CRUD API script or application ...................................... 50 3.2.1 Installing MySQL API on Linux Server running
    [Show full text]
  • Messaging, Malware and Mobile Anti-Abuse Working Group M3AAWG Best Current Practices for Building and Operating a Spamtrap Version 1.2.0 Updated August 2016
    Messaging, Malware and Mobile Anti-Abuse Working Group M3AAWG Best Current Practices For Building and Operating a Spamtrap Version 1.2.0 Updated August 2016 Table of Contents INTRODUCTION .............................................................................................................................................. 1 SPAMTRAP GOALS/PURPOSE ........................................................................................................................ 2 SPAMTRAP ADDRESSES .................................................................................................................................. 2 ISSUES WITH ADDRESSES ............................................................................................................................................................. 3 IMPLEMENTATION CONSIDERATIONS .................................................................................................... 3 ANALYSIS ............................................................................................................................................................ 4 SECURITY ........................................................................................................................................................... 5 INFORMATION SHARING .............................................................................................................................. 5 HINTS AND PITFALLS ....................................................................................................................................
    [Show full text]
  • Das Umfassende Praxis-Handbuch« (ISBN 978-3-8266-5095-6) 2013 by Verlagsgruppe Hüthig Jehle Rehm Gmbh, Heidelberg
    Stichwortverzeichnis Numerisch Links 95 8Seconds 576 Marketing 56, 189 Netzwerk 189 A Relations 83, 85 Abmeldung 259, 378, 719, 746, 806 AIDA 335 mentale 260, 496, 530 Akkreditierung 836 Above-the-Scroll 118, 682 Aktionstag 237 Absender 681 Aktionstage 237 Auswahl 411 Aktivierung 424 Fälschung siehe Spoofing Alternativhypothese 556 siehe Wechsel 111, 413 Alt-Text Grafik-Alternativtext siehe Abuse Reporting Format siehe ARF America Online AOL AddThis 639 Anbieterauswahl 94 siehe AddToAny 640 Anhang E-Mail-Anhang Adobe AniMake 447 Photoshop 445 Animation 123, 367, 443, 460, 612, 616 Adress Anmeldeformular Harvesting 799 Anrede 137, 235 Miete 78 Auswahlliste 164 Validierung 817 Bezeichner 157 siehe Adressbuch-Eintrag 208, 345, 843 Call-to-Action Handlungsaufforde- Adresse rung Adressbuch 208 Fehlermeldung 164 Alter 196 Geburtsdatum 134, 136, 144, 163 Bestand siehe Verteiler Gruppierung 153 catch-all 814 JavaScript 168 Ermüdung 260 kontextsensitives 130 Gewinnung 74 Lückentext 166 Kauf 175, 186 Optionsfeld 125, 160, 163 Markt 199 Redundanzfreiheit 137 Miete 66, 192 Reihenfolge 154, 159 Qualität 144, 183 Textfeld 148, 161 Quantität 183 Tooltip 161 Quelle 175, 183, 230 Validierung 165 Validierung 217 Video 181 Advanced Research Projects Agency Network Anrede 354 siehe ARPAnet Anzeigenwerbung 79, 180, 192, 308, 565, Affiliate 657, 741 897 © des Titels »E-Mail-Marketing – Das umfassende Praxis-Handbuch« (ISBN 978-3-8266-5095-6) 2013 by Verlagsgruppe Hüthig Jehle Rehm GmbH, Heidelberg. Nähere Informationen unter: http://www.mitp.de/5095 Stichwortverzeichnis
    [Show full text]
  • Modul 42 Spam
    Modul 42 Spam Studienbrief 1: Grundlagen Studienbrief 2: Spam-Techniken Studienbrief 3: Anti-Spam Techniken Studienbrief 4: Rechtslage Autoren: Christopher Wolf Sebastian Uellenbeck 1. Auflage Ruhr-Universität Bochum © 2013 Ruhr-Universität Bochum Universitätsstraße 150 44801 Bochum 1. Auflage (17. Juli 2013) Didaktische und redaktionelle Bearbeitung sowie Produktion: Romy Rahnfeld (M. A. Germanistik Sprachwissenschaft/Erziehungswis- senschaft) Das Werk einschließlich seiner Teile ist urheberrechtlich geschützt. Jede Ver- wendung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung der Verfasser unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspe- icherung und Verarbeitung in elektronischen Systemen. Inhaltsverzeichnis Seite3 Inhaltsverzeichnis Einleitung zu den Studienbriefen6 I. Abkürzungen der Randsymbole und Farbkodierungen.......6 II. Zu den Autoren..............................7 III. Modullehrziele..............................8 Studienbrief 1 Grundlagen9 1.1 Lernziele..................................9 1.2 Einleitung.................................9 1.2.1 E-Mail...............................9 1.2.2 Spam................................ 11 1.2.3 RFC (Request for Comments).................. 15 1.2.4 Gliederung............................ 16 1.2.5 Kontrollaufgaben......................... 16 1.3 Internet................................... 17 1.4 E-Mail-Infrastruktur........................... 17 1.4.1 Kommunikationsmodell..................... 18
    [Show full text]
  • Advances in the Detection of Malicious Servers and the Analysis of Client-Side Vulnerabilities
    UNIVERSIDAD POLITECNICA´ DE MADRID ESCUELA TECNICA´ SUPERIOR DE INGENIEROS INFORMATICOS´ Defending Against Cybercrime: Advances in the Detection of Malicious Servers and the Analysis of Client-Side Vulnerabilities PH.D THESIS Antonio Nappa Copyright c February 2016 by Antonio Nappa DEPARTAMENTAMENTO DE LENGUAJES Y SISTEMAS INFORMATICOS´ E INGENIERIA DE SOFTWARE ESCUELA TECNICA´ SUPERIOR DE INGENIEROS INFORMATICOS´ Defending Against Cybercrime: Advances in the Detection of Malicious Servers and the Analysis of Client-Side Vulnerabilities SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF: Doctor en Inform´atica Author: Antonio Nappa Advisor Dr. Juan Caballero February 2016 Jury: Somesh Jha, Professor of Computer Sciences - University of Wisconsin-Madison Lorenzo Cavallaro, Senior Lecturer of Computer Sciences - Royal Holloway University of London Juan Manuel Est´evez Tapiador, Profesor Titular de Universidad - Universi- dad Carlos III de Madrid Victor A. Villagr´a, Profesor Titular de Universidad - Universidad Polit´ecnica de Madrid Boris K¨opf, Assistant Research Professor - IMDEA Software Institute Carmela Troncoso, Researcher - IMDEA Software Institute Manuel Carro, Profesor Titular de Universidad - Universidad Polit´ecnica de Madrid Resumen de la tesis Esta tesis se centra en el an´alisisde dos aspectos complementarios de la ciberdelin- cuencia (es decir, el crimen perpetrado a trav´esde la red para ganar dinero). Estos dos aspectos son las m´aquinasinfectadas utilizadas para obtener beneficios econ´omicosde la delincuencia a trav´esde diferentes acciones (como por ejemplo, clickfraud, DDoS, correo no deseado) y la infraestructura de servidores utiliza- dos para gestionar estas m´aquinas(por ejemplo, C & C, servidores explotadores, servidores de monetizaci´on,redirectores). En la primera parte se investiga la exposici´ona las amenazas de los orde- nadores victimas.
    [Show full text]
  • M3AAWG Complaint Feedback Loop Best Current Practices
    Messaging Anti-Abuse Working Group (MAAWG) Complaint Feedback Loop Best Current Practices April 2010 Introduction Complaint Feedback Loops similar to those described herein have existed for more than a decade, resulting in many de facto standards and best practices. This document is an attempt to codify, and thus clarify, the ways that both providers and consumers of these feedback mechanisms intend to use the feedback, describing some already-common industry best practices. This paper is the result of cooperative efforts within the Messaging Anti-Abuse Working Group. MAAWG is the largest global industry association working against Spam, viruses, denial-of-service attacks and other online exploitation. It members include ISPs, network and mobile operators, key technology providers and volume sender organizations. It represents over one billion mailboxes worldwide and its membership contributed their expertise in developing this description of current Feedback Loop practices. Table of Contents Glossary of Standard Terms ...................................................................................................................................................2 Overview.................................................................................................................................................................................5 Mailbox Providers and Feedback Providers ..........................................................................................................................6 Benefits of Providing Feedback
    [Show full text]
  • Mail Assure Admin Guide
    Mail Assure User Guide - Admin, Domain and Email Level Last Updated: Monday, February 11, 2019 ----------- ©2019 CONTENTS Getting Started 1 Useful Links 1 Accessing Mail Assure 1 User Permissions 3 Timeout Settings 4 Supported Web Browsers 4 Multi-admin control panel access and audit trail 4 Retrieve Log-in Link 6 Existing Account - Forgotten Password 7 Create New Email Account via the Retrieve Log-in Link 7 Send User Customized Link to Reset Password 7 Admin Level Control Panel 7 What do you want to do? 7 Domain Level Control Panel 8 Accessing the Domain Level Control Panel 8 Email Level Control Panel 8 Accessing the Email Level Control Panel 9 What do you want to do? 9 Using the 'Login as' Feature 9 Log in as a Domain user. 9 Log in as an Email User 10 Finish Login as Session 11 Start a Free 30-day Trial with Mail Assure 11 Application Overview 12 Navigating Mail Assure 12 Customize Dashboard 12 Mail Assure Edit Dashboard Panel Group 13 Add a Widget 13 Reset to Default Dashboard Setting 13 Features Preview 14 List of features in Preview mode 14 API Logs (Preview) 14 Archive Usage (Preview) 15 Email Scout Report Templates (Preview) 16 Create Email Scout Report Template 17 View Incoming/Outgoing Reports from a Particular Template 25 On-demand Archive Index 26 What's New 27 New Navigation/Dashboard Feature Mapping 36 Admin Level Mappings 36 Domain Level Mappings 41 Email Level Mappings 45 FAQs 47 Troubleshooting Tips 49 Incoming mail is wrongly blocked 49 Incoming spam is getting through 49 Outgoing mail is wrongly blocked 50 Report Security
    [Show full text]