Analysis of SSH Executables
Total Page:16
File Type:pdf, Size:1020Kb
Masaryk University Faculty of Informatics Analysis of SSH executables Bachelor’s Thesis Tomáš Šlancar Brno, Fall 2019 Masaryk University Faculty of Informatics Analysis of SSH executables Bachelor’s Thesis Tomáš Šlancar Brno, Fall 2019 This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Tomáš Šlancar Advisor: RNDr. Daniel Kouřil, Ph.D. i Acknowledgements I want to thank a lot to my advisor RNDr. Daniel Kouřil, Ph.D., for his excellent guidance, patience, and time. I appreciate every advice he gave me. I would also like to thank him and ESET for the provided collection of the infected SSH executables. Last but not least important, I would like to thank my family for all the support. They gave me a great environment while writing the thesis, and I appreciate it very much. iii Abstract The thesis describes the dynamic analysis of SSH executables with the main focus on the client-side. There are also some small aspects relating to the server-side too. The examination is performed only on the level of system calls and using OS Linux. The purpose is to propose a series of steps and create tools that would help to determine whether an unknown SSH binary contains malicious code or is safe and whether it is possible to automate the decision. The setup part, analysis part, and determination part do with minimal user effort. The whole process should be at least semi- automatic, but future steps for better automation are mentioned. At the end of the thesis is an evaluation of proposed mechanisms with the ideas for better results. The correctness is tested on a collection of samples given by ESET.. iv Keywords SSH, malware, dynamic analysis, Sysdig, Docker, system calls v Contents Introduction 1 1 Collecting system calls 3 1.1 Dynamic analysis .......................3 1.2 Sysdig .............................4 1.3 Docker .............................6 1.4 Description of the implementation ..............7 2 The Data analysis 11 2.1 The approach ......................... 11 2.1.1 SSH client . 11 2.1.2 SSH server . 12 2.2 Description of the implementation .............. 13 3 ESET samples and their analysis 17 3.1 Overview of samples ..................... 17 3.2 Success Rate .......................... 19 3.3 Problems during analysis ................... 21 3.4 Ideas for improvements .................... 22 4 Conclusion 25 Bibliography 27 A List of electronic attachments 29 vii Introduction The analysis of executables is a broad field of research. The point of that is to decide whether the program is doing what it suppose to do. There is a thin line between malicious behavior and righteous one. An excellent example of this is a rootkit that hides its processes. That looks malicious, but for antiviruses, this is a necessity. Therefore, there is a need for a detailed description of the program behavior. There are many ways to perform an examination. In general, they are in two categories — static and dynamic analysis. The focus in this thesis is only on the system calls, which fall under the category of dynamic analysis. The amount of tools for this job is more than enough, but the vast majority is targeted for Windows only. The target is to perform it under OS Linux and for one of the most used programs under this operating system — SSH. There exist a few tools like probably the most used and versatile Cuckoo or more Linux specific like strace or Sysdig. Other programs are often built upon Cuckoo core. It is a useful tool primarily because it combines many inputs like network traffic, system calls, library calls, check against VirusTotal, and more. The disadvantage is the setup. It takes a long time, and it is a complicated process. On the other hand, strace and Sysdig collect only system calls and let decisions up to a user. Their log can be extensive, and it can take a long time to see any malicious behavior. Strace can also be detected. Sysdig, however, receives all system calls happening in the system as in comparison to strace, which collects only the executables one. Sysdig also offers a great variety of filters — some even for Docker containers. This fact comes handy mainly because Docker containers provide a dynamic environment setup and good enough isolation[1], including network. The result of this is the usage of Docker containers with a combination of Sysdig for collecting the data. Their description in a more detailed way is in the first chapter with the script combining these tools. The second chapter focuses on the evaluation of the data, the de- scription of the approach, and the implementation. The SSH client has the advantage that there is no reason for parallelism, therefore no need for forks, clone, more threads, and similar. The collected data 1 from the SSH client in the same environment with the same steps, they will be identical. For the SSH server, this is not the case. However, the analysis is focused more on the client than on the server. Nevertheless, the server system calls are still collected and tested. Their data can still at least provide more insight into the evaluation and can lead the way to the possible success. The better report is in the third chapter, as well as the description of the SSH samples. The collection provided by ESET contains SSH clients and SSH servers, too. The architectures in the collection are also very diverse. There was a need to eliminate some samples because their architecture was not possible to execute. Other eliminations were also in place as their binary was too hard to execute due to various reasons. The description of problems connected with running the samples is in the third chapter too. The third chapter also talks about the success of this approach over the dataset provided by ESET and ideas for future development. 2 1 Collecting system calls This thesis focuses on system calls that fall under the category of dy- namic analysis. Therefore, it is needed to monitor and capture all the malware’s behavior on this level. The system call collecting functional- ity offers a tool named Sysdig. It is a system-level monitoring toolwith native container support combining the functionality of programs like strace, tcpdump, htop, iftop, lsof, and more [2]. Nevertheless, executing the binary is required to collect any data. Execution of the malware in an isolated environment is essential. The main reasons are to prevent any real damage and to restrict any further spreading. Docker implements a secure way of containing the malware in a container and yet still to be flexible for quick changes. Spawning a new container is quick compared to using VirtualBox1. Then the only thing left is to combine both these tools and control them from one place. For this purpose, a simple Python script was written. The result is that it is possible always to prepare the same environment and execute the SSH with the same steps in many ways. Both Sysdig and Docker are open-source projects; their source code is available on GitHub. 1.1 Dynamic analysis The dynamic analysis is a form of analysis done by running the actual malware and observe its behavior. The observation can be at various levels, but running the program is the only common attribute. The highest level is user interaction. That means running the program and see what it visibly does. On lower level can be an examination of RAM, processes, registry, traffic on a network, and more. On the lowest level, there is an examination of library calls and system calls. Each thing and each tool uses a different approach to getting its data. This fact gives the investigator the option of choosing the right tool for a specific job because each tool has its advantages. 1. VirtualBox is a powerful virtualization tool by Oracle. It is possible to run Virtu- alBox on Windows, Linux, macOS. 3 1. Collecting system calls Running malware means one thing and that the environment will be infected — everything on the machine will be compromised. Mal- ware could then even spread further, so as prevention, the execu- tion needs to be in a contained environment. Usually, achieving this environment means to use any form of virtualization. For example, VirtualBox is often used. The dynamic analysis is a way for an investigator to examine the malware, but there are problems connected with the dynamic analysis. Triggering malware is one of them. Malware usually tries to hide from detection. It can even check whether it runs inside a virtual environment or that known detection tools are running alongside. Having many detection tools for collecting various inputs give the investigator a better picture of the actual behavior of the malware. The tool like this is, for example, a Cuckoo. It collects system calls, library calls, network traffic, memory, and more. This thesis focuses only on system calls. A system call is a way for programs to communicate with the operating system. Accessing memory, accessing files, all networking, creating processes, and more are jobs for the operating system. To be more precise, a kernel job. For all the information available in system calls, choosing them is a good starting point for any analysis. However, there can be too much irrelevant data. In the case of SSH, there is a good chance of seeing the malware storing the stolen credentials somewhere.