An In-Memory Embedding of Cpython for Offensive Use
Total Page:16
File Type:pdf, Size:1020Kb
An In-memory Embedding of CPython for Offensive Use Ateeq Sharfuddin, Brian Chapman, Chris Balles SCYTHE fateeq, brian, [email protected] Abstract—We offer an embedding of CPython that runs using packagers like Py2Exe [6] or PyInstaller [4]. These entirely in memory without “touching” the disk. This in-memory malware are easy to construct and are executed when a coerced embedding can load Python scripts directly from memory instead end-user double-clicks and runs the packaged executable file these scripts having to be loaded from files on disk. Malware that resides only in memory is harder to detect or mitigate against. from disk. Security products have collected and analyzed We intend for our work to be used by security researchers to numerous sample executable files of this scheme. As a result, rapidly develop and deploy offensive techniques that is difficult security products have produced signatures for this scheme and for security products to analyze given these instructions are are able to successfully block all malware using this scheme. in bytecode and only translated to machine-code by the inter- In this paper, we contribute the first embedding of CPython preter immediately prior to execution. Our work helps security researchers and enterprise Red Teams who play offense. Red interpreter that runs entirely in memory without “touching” Teams want to rapidly prototype malware for their periodic the disk. We also contribute a scheme to support loading campaigns and do not want their malware to be detected by the and running Python modules including native C extensions, Incident Response (IR) teams prior to accomplishing objectives. also residing only in memory, onto this interpreter. We are Red Teams also have difficulty running malware in production able to achieve these results without needing any file or from files on disk as modern enterprise security products emulate, inspect, or quarantine such executables given these files have no dependency to reside on disk, unlike the Py2Exe or PyInstaller reputation. Our work also helps enterprise Hunt and IR teams packaging schemes. We have not come across any other in- by making them aware of the viability of this type of attack. memory embeddings of Python. Constructing this CPython Our approach has been in use in production for over a year and embedding was an engineering effort: We claim no new meets our customers’ needs to quickly emulate threat-actors’ algorithms or techniques. Our paper details the necessary steps tasks, techniques, and procedures (TTPs). to successfully load and utilize an in-memory embedding of Keywords-Computer security, Red Team, in-memory, Python CPython. Our goal is to assist security researchers and enterprise Red I. INTRODUCTION Teams. Enterprise Read Teams are responsible for emulating Different classes of threat-actors have different motivations, numerous classes of threat-actors. Red Teams prefer to repur- and therefore, produce and utilize different types of malware to pose existing malware or open-sourced security experiments. complete their goals. Nation-states are the most sophisticated Red Teams have deadlines to meet and run campaigns with threat-actors and go through great lengths to ensure the end- periodicity: As such, they want to emulate different threat- user is not alerted to their intrusions. As a consequence, actors quickly, and this is more time-consuming to accomplish whereas cyber-criminals are not so concerned about leaving in a binary compiled from C or assembly than executing artifacts of compromise behind, malware produced by nation- statements in a Python interpreter. These malware also need to states aim to leave no artifact behind and have built-in security evade Hunt and Incident Response teams until the objectives arXiv:2103.15202v2 [cs.CR] 4 May 2021 hindering forensic analysis. One method threat-actors use is to for the campaign are at least partially met. Red Teams also keep much of the running malware only in volatile memory, have difficulty running malware from files on disk as modern for example, a system’s RAM. This way, when the system enterprise security products emulate, inspect, or quarantine shuts down or is restarted, there is no disk artifact of the such executables given that these files have no reputation. intrusion as state is lost in volatile memory when power is A month-long campaign effort can be rendered worthless on removed from the system. the first day of use if a Red Team’s artifact is immediately Python is a popular interpreted high-level programming lan- caught upon execution by the Hunt or IR teams. We offer guage [24] with an emphasis on readability and an abundance this information in the interest of raising awareness of the of reusable packages produced by the community. CPython is viability of this type of attack. As a simplification, we do the default and most widely used implementation of the Python not cover any additional security or obfuscation mechanisms language, written in C and Python. With the ctypes package, that a threat-actor would use, for example, changing bytecode CPython is as expressive as C, allowing non-interpreted native mappings or preventing memory dumps, nor do we cover how instruction execution. One common scheme unsophisticated to land onto the virtual address space of a process: These well- threat-actors use is to write their malware in Python and pack- known topics have been covered by other researchers [16], age the malware and all its dependencies into an executable [23]. Both Windows and POSIX implementations are in use in production. As a simplification, we only extensively cover the “freeze” [9] these Python C extensions and compile them into steps for Windows. However, similar steps apply to POSIX, the interpreter. However, this is not a flexible approach: You and we describe differences in Section V-E. must recompile the interpreter each time for a different set of Section II describes prior research relevant to this paper. “frozen” modules. Section III details the in-memory CPython interpreter. Sec- tion IV details loading Python scripts from memory into this C. Python module Loaders in-memory interpreter. Section V details in-memory dynamic The official stock CPython interpreter loads modules and loading of native Python C Extensions into this in-memory packages from files on disk using its PathFinder importer interpreter. Section VI covers emulation of a few security object. The interpreter also contains a frozen zipimport [20] techniques with this interpreter. Section VII covers future module that is limited to importing a single package or module work. from a ZIP file on disk. This zipimport cannot import Python C extensions, either. We constructed our cba_zipimport de- II. BACKGROUND rived from zipimport that can load multiple packages, modules, We cover prior work relevant to this paper in this section. and Python C extensions from an in-memory ZIP archive. Specifically, we cover related Python embeddings, dynamic loading techniques, and in-memory loading of Python mod- III. IN-MEMORY EMBEDDING OF PYTHON ules. We utilize the in-memory dynamic loading technique We cover the concept of embedding the Python interpreter derived from a prior work [22] in order to load native C in another program in this section. The Python interpreter Python Extensions into memory without “touching” the disk: is available as a shared library. We cover loading a shared This does not need to be reinvented. library from memory, and although this is prior art, we feel it is nonetheless important to summarize. Note that using A. Contemporary Python Embeddings an in-memory dynamic loading method [22] to load this Numerous malware have been publicized that use the Python interpreter shared library into memory is not enough Py2Exe or PyInstaller methods [7]. Signatures have been to initialize and start invoking Python scripts. We list the produced for these malware, and often security products additional necessary steps that lead to a successful initial- simply quarantine any Py2Exe or PyInstaller-generated ex- ization of the interpreter. We concluded that these were the ecutable without inspecting if the encapsulated payload is only steps necessary in an iterative fashion by running the malicious or benign. One malware was publicized in [23] CPython interpreter through a debugger observing failures, where the malware-authors recompiled the CPython interpreter reading the CPython source code, and reading official CPython and remapped the opcodes to make analysis difficult: In this documentation, in that order. case, this recompiled interpreter was still an executable file on disk. In [8], the authors take a peek inside Dropbox’s A. Embedding Python executable file and use a reflective loader [3] to inject their The Python interpreter was designed to be embedded shared library into the Dropbox process’s address space: into applications. Embedding allows an application developer The Dropbox executable file was also a Python interpreter. to implement some application functionality in the Python These listed cases differ from our approach: We show how programming language rather than in C or C++ [17]. An to download the interpreter into a location in memory and application initializes the embedded Python interpreter with then run the interpreter from there. Our approach does not a call to Py_Initialize. The interpreter can then be called “touch” the disk, meaning no intermediate step exists where from any part of the application. Simple and pure embeddings the interpreter or any dependency is saved as a file to disk in are covered in detail in [17]. The Python interpreter itself order for the modified CPython interpreter to function. is implemented in a shared library. For example, the Python 3.8 core shared library on Windows is python38.dll, and the B. Dynamic Loading main Python application (python.exe) loads this shared library The documented way to load a shared library at runtime and provides a read-eval-print loop (REPL) interface to the (also known as dynamic loading [13]) is to use functions user.