A Lustrum of Malware Network Communication: Evolution and Insights Chaz Levery, Platon Kotzias∗, Davide Balzarotti∓, Juan Caballero∗, Manos Antonakakisz fchazlever,
[email protected],
[email protected], fplaton.kotzias,
[email protected] y Georgia Institute of Technology, School of Computer Science, z Georgia Institute of Technology, School of Electrical and Computer Engineering, ∗ IMDEA Software Institute, ∓ EURECOM Abstract—Both the operational and academic security commu- analysis—focusing on topics like the role of cloud providers, nities have used dynamic analysis sandboxes to execute malware the infrastructure behind drive-by downloads, or the domains samples for roughly a decade. Network information derived used by few malware families. from dynamic analysis is frequently used for threat detection, network policy, and incident response. Despite these common To shed light on this important problem, we report the and important use cases, the efficacy of the network detection results of a five year, longitudinal study of dynamic analy- signal derived from such analysis has yet to be studied in depth. sis traces collected from multiple (i.e., two commercial and This paper seeks to address this gap by analyzing the network one academic) malware feeds. These feeds contain network communications of 26.8 million samples that were collected over information extracted from the execution of more than 26.8 a period of five years. million unique malware samples. We complement this dataset Using several malware and network datasets, our large scale with over five billion DNS queries collected from a large North study makes three core contributions. (1) We show that dynamic American internet service provider (ISP).