Large-Scale Detection and Measurement of Malicious Content
Total Page:16
File Type:pdf, Size:1020Kb
Large-Scale Detection and Measurement of Malicious Content Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universität Mannheim vorgelegt von Jan Gerrit Göbel aus Köln, Deutschland Mannheim, 2011 Dekan: Professor Dr. Wolfgang Effelsberg, Universität Mannheim Referent: Professor Dr. Felix Christoph Freiling, Universität Mannheim Korreferent: Professor Dr. Christopher Kruegel, University of California, Santa Barbara Tag der mündlichen Prüfung: 27. April 2011 Abstract Many different network and host-based security solutions have been developed in the past to counter the threat of autonomously spreading malware. Among the most common detection techniques for such attacks are network traffic analysis and the so-called honeypots. In this thesis, we introduce two new malware detection sensors that make use of the above mentioned techniques. The first sensor called Rishi, passively monitors network traffic to automatically detect bot infected machines. The second sensor called Amun follows the concept of honeypots and detects malware through the emulation of vulnerabilities in network services that are commonly exploited. Both sensors were operated for two years and collected valuable data on autonomously spreading malware in the Internet. From this data we were able to, for example, study the change in exploit behavior and derive predictions about preferred targets of todays’ malware. Zusammenfassung In der Vergangenheit wurden viele Sicherheitslösungen zur Bekämpfung von sich autonom verbreitenden Schadprogrammen entwickelt. Einige von diesen Lösungen set- zen lokal an einem Rechner an, andere hingegen an Netzen und deren Datenverkehr. Zu den bekanntesten Erkennungstechniken gehören die Analyse des Netzverkehrs und sogenannte Hon- eypots. In dieser Arbeit stellen wir zwei neue Sensoren zur Erkennung von Schadprogrammen vor, die die eben genannten Techniken verwenden. Der erste Sensor, genannt Rishi, lauscht passiv an einem Netz und erkennt durch die Analyse des Datenverkehrs Rechner, die mit einem Bot infiziert sind. Der zweite Sensor ist Amun. Dies ist ein Honeypot und erkennt Schadprogramme durch die Emulation von oft ausgenutzten Schwachstellen in Netzwerkdiensten. Beide Sensoren wur- den über zwei Jahre hinweg betrieben und haben in dieser Zeit wertvolle Informationen über sich autonom verbreitende Schadprogramme im Internet gesammelt. Zum Beispiel konnten wir Verän- derungen im Exploit-Verhalten feststellen und Aussagen über zukünftige Angriffsziele ableiten. “A spoonful of honey will catch more flies than a gallon of vinegar.” Benjamin Franklin [Hal04] Acknowledgements First of all, I would like to thank my advisor Prof. Dr. Freiling, who gave me the opportu- nity to work in this exciting area of computer science and supported my research in every way. Throughout this thesis, he always provided me with valuable feedback and I never had a question unanswered. Thank you for this excellent guidance. Furthermore, I would like to thank Prof. Dr. Christopher Krügel for accepting to be my co- advisor. Although, we have never really met in person his name and work is well-known in the context of malware detection and analysis. Thus, i am honoured that he accepted to be my co- advisor. I would also like to thank all members of the Network Operations Center (NOC) at RWTH Aachen University, especially Jens Hektor, who allowed me to deploy and maintain one of the biggest Honeynets in Germany. Without the information collected at this network during the last years all the work presented in this thesis would not have been possible. In this context, I would also like to thank Jens Syckor from TU Dresden, Matteo Cantoni from Italy, and Eric Chio from China, who all thankfully provided me with their honeypot sensor data and thus enabled me to make predictions about optimal sensor placement strategies with respect to geographical distant sensors. There are also many other people I would like to thank for supporting me in the development of various aspects of my work. First of all, I would like to thank Philipp Trinius for writing some interesting papers with me and providing me with valuable feedback and suggestions, but also for being an uncomplicated co-worker and friend. This last statement holds actually true for all people working at the Laboratory for Dependable Distributed Systems of Mannheim University that I met during this time. Without the assistance of Michael Becher, Zina Benenson, Andreas Dewald, Markus Engelberth, Christian Gorecki, Thorsten Holz, Philipp Trinius, and Carsten Willems it would not have been so much fun. Part of this work also benefited from the valuable feedback and enhancements of Stefan Vömel who volunteered proof-reading one of the chapters. I would also like to thank all diploma/bachelor students and student workers, who helped over the years to implement some parts of the work presented in this thesis. In particular, Ben Stock and Matthias Luft helped a lot to develop and implement tools to evaluate the large amount of Honeynet data we have collected. Both the atmosphere and cheerfulness at the Laboratory for Dependable Distributed Systems of the University of Mannheim, as well as the great cooperation with the Center for Computing and Communication of RWTH Aachen University was amazing and tremendously contributed to the vii outcome of this thesis. It was a pleasure working with all of you. Finally, I have to express my gratitude to my wife and my son. Without her doing most of the work at home I could not have spent so much time working on this thesis. But they both were also very helpful in distracting me and helping me not to forget that there is an everyday life next to work. Contents Acknowledgements List of Figures v List of Tables ix List of Listings xi 1 Introduction 1 1.1 Introduction . .1 1.2 Motivation . .2 1.3 Contributions . .2 1.3.1 Malware Sensors . .2 1.3.2 Large-Scale Data Evaluation . .3 1.4 Thesis Outline . .4 1.5 List of Publications . .5 2 Background 7 2.1 Introduction . .7 2.2 Internet Relay Chat . .8 2.3 Bots and Botnets . .9 2.4 Honeypots and Honeynets . 11 2.4.1 Honeypot Definition . 12 2.4.2 Low- and High-Interaction Honeypots . 12 2.4.3 Physical and Virtual Honeypots . 16 2.4.4 Client and Server Honeypots . 17 2.5 Exploits and Shellcode . 18 2.5.1 Buffer Overflow . 19 2.5.2 Shellcode Obfuscation Techniques . 21 2.6 Summary . 25 3 Related Work 27 3.1 Introduction . 27 i Contents 3.2 IRC Botnet Detection . 27 3.3 Low-Interaction Honeypots . 28 3.4 Early Warning Systems . 30 3.5 Large-Scale Evaluation of Incident Information . 32 3.6 Summary . 33 4 No-Interaction Malware Sensor 35 4.1 Introduction . 35 4.2 Concept and Methodology . 36 4.3 Rishi Botnet Detection . 37 4.4 Implementation Details of Rishi . 40 4.4.1 The Scoring Function . 40 4.4.2 Regular Expression . 41 4.4.3 Whitelisting . 42 4.4.4 Blacklisting . 43 4.4.5 Rishi Configuration . 44 4.5 Limitations . 47 4.6 Selected Events Monitored with Rishi . 49 4.6.1 Case Study: Detecting Spam-Bots . 49 4.6.2 Case Study: Spotting Botnet Tracking . 50 4.7 Rishi Webinterface . 51 4.8 Summary . 55 5 Low-Interaction Malware Sensor 57 5.1 Introduction . 57 5.2 Concept and Methodology . 58 5.3 Amun Honeypot . 61 5.4 Implementation Details of Amun . 63 5.4.1 Amun Configuration . 63 5.4.2 Amun Kernel . 67 5.4.3 Request Handler . 68 5.4.4 Vulnerability Modules . 69 5.4.5 Shellcode Analyser . 81 5.4.6 Command-Shell Module . 83 5.4.7 Download Modules . 83 5.4.8 Submission Modules . 86 5.4.9 Logging Modules . 89 5.5 Limitations . 90 5.6 Selected Events Monitored with Amun . 92 5.6.1 Case Study: Amun Webserver Emulation . 92 5.6.2 Case Study: Palevo worm . 94 5.7 Summary . 98 6 Malware Sensor Infrastructure 101 6.1 Introduction . 101 6.2 InMAS Infrastructure Overview . 102 6.3 Malware Capture . 103 6.4 Malware Repository . 107 ii Contents 6.5 Malware Analysis . 108 6.6 Webinterface . 110 6.7 Summary . 114 7 Malware Sensor Evaluation 117 7.1 Introduction . 117 7.2 Rishi Evaluation . 118 7.2.1 Data Points and Definitions . 118 7.2.2 Measurement Periods . 119 7.2.3 Long-term Investigation of Sandbox Traffic . 119 7.2.4 Selected Aspects of Collected Botnet Data . 127 7.3 Amun Evaluation . 130 7.3.1 Data Points and Definitions . 130 7.3.2 Measurement Periods . 131 7.3.3 Honeypot Sensors . 131 7.3.4 Database Layout . 134 7.3.5 Long-term Investigation of the Aachen Honeynet . 136 7.3.6 Short-term comparison of Honeynet Attacks . 149 7.4 Summary . ..