An Analysis of Domain Classification Services
Mis-shapes, Mistakes, Misfits: An Analysis of Domain Classification Services Pelayo Vallina Victor Le Pochat Álvaro Feal IMDEA Networks Institute / imec-DistriNet, KU Leuven IMDEA Networks Institute / Universidad Carlos III de Madrid Universidad Carlos III de Madrid Marius Paraschiv Julien Gamba Tim Burke IMDEA Networks Institute IMDEA Networks Institute / imec-DistriNet, KU Leuven Universidad Carlos III de Madrid Oliver Hohlfeld Juan Tapiador Narseo Vallina-Rodriguez Brandenburg University of Universidad Carlos III de Madrid IMDEA Networks Institute/ICSI Technology ABSTRACT ACM Reference Format: Domain classification services have applications in multiple areas, Pelayo Vallina, Victor Le Pochat, Álvaro Feal, Marius Paraschiv, Julien including cybersecurity, content blocking, and targeted advertising. Gamba, Tim Burke, Oliver Hohlfeld, Juan Tapiador, and Narseo Vallina- Rodriguez. 2020. Mis-shapes, Mistakes, Misfits: An Analysis of Domain Yet, these services are often a black box in terms of their method- Classification Services. In ACM Internet Measurement Conference (IMC ’20), ology to classifying domains, which makes it difficult to assess October 27ś29, 2020, Virtual Event, USA. ACM, New York, NY, USA, 21 pages. their strengths, aptness for specific applications, and limitations. In https://doi.org/10.1145/3419394.3423660 this work, we perform a large-scale analysis of 13 popular domain classification services on more than 4.4M hostnames. Our study empirically explores their methodologies, scalability limitations, 1 INTRODUCTION label constellations, and their suitability to academic research as The need to classify websites became apparent in the early days well as other practical applications such as content filtering. We of the Web. The first generation of domain classification services find that the coverage varies enormously across providers, ranging appeared in the late 1990s in the form of web directories.
[Show full text]