An In-Memory Embedding of Cpython for Offensive Use

Total Page:16

File Type:pdf, Size:1020Kb

An In-Memory Embedding of Cpython for Offensive Use An In-memory Embedding of CPython for Offensive Use Ateeq Sharfuddin, Brian Chapman, Chris Balles SCYTHE fateeq, brian, [email protected] Abstract—We offer an embedding of CPython that runs using packagers like Py2Exe [6] or PyInstaller [4]. These entirely in memory without “touching” the disk. This in-memory malware are easy to construct and are executed when a coerced embedding can load Python scripts directly from memory instead end-user double-clicks and runs the packaged executable file these scripts having to be loaded from files on disk. Malware that resides only in memory is harder to detect or mitigate against. from disk. Security products have collected and analyzed We intend for our work to be used by security researchers to numerous sample executable files of this scheme. As a result, rapidly develop and deploy offensive techniques that is difficult security products have produced signatures for this scheme and for security products to analyze given these instructions are are able to successfully block all malware using this scheme. in bytecode and only translated to machine-code by the inter- In this paper, we contribute the first embedding of CPython preter immediately prior to execution. Our work helps security researchers and enterprise Red Teams who play offense. Red interpreter that runs entirely in memory without “touching” Teams want to rapidly prototype malware for their periodic the disk. We also contribute a scheme to support loading campaigns and do not want their malware to be detected by the and running Python modules including native C extensions, Incident Response (IR) teams prior to accomplishing objectives. also residing only in memory, onto this interpreter. We are Red Teams also have difficulty running malware in production able to achieve these results without needing any file or from files on disk as modern enterprise security products emulate, inspect, or quarantine such executables given these files have no dependency to reside on disk, unlike the Py2Exe or PyInstaller reputation. Our work also helps enterprise Hunt and IR teams packaging schemes. We have not come across any other in- by making them aware of the viability of this type of attack. memory embeddings of Python. Constructing this CPython Our approach has been in use in production for over a year and embedding was an engineering effort: We claim no new meets our customers’ needs to quickly emulate threat-actors’ algorithms or techniques. Our paper details the necessary steps tasks, techniques, and procedures (TTPs). to successfully load and utilize an in-memory embedding of Keywords-Computer security, Red Team, in-memory, Python CPython. Our goal is to assist security researchers and enterprise Red I. INTRODUCTION Teams. Enterprise Read Teams are responsible for emulating Different classes of threat-actors have different motivations, numerous classes of threat-actors. Red Teams prefer to repur- and therefore, produce and utilize different types of malware to pose existing malware or open-sourced security experiments. complete their goals. Nation-states are the most sophisticated Red Teams have deadlines to meet and run campaigns with threat-actors and go through great lengths to ensure the end- periodicity: As such, they want to emulate different threat- user is not alerted to their intrusions. As a consequence, actors quickly, and this is more time-consuming to accomplish whereas cyber-criminals are not so concerned about leaving in a binary compiled from C or assembly than executing artifacts of compromise behind, malware produced by nation- statements in a Python interpreter. These malware also need to states aim to leave no artifact behind and have built-in security evade Hunt and Incident Response teams until the objectives arXiv:2103.15202v2 [cs.CR] 4 May 2021 hindering forensic analysis. One method threat-actors use is to for the campaign are at least partially met. Red Teams also keep much of the running malware only in volatile memory, have difficulty running malware from files on disk as modern for example, a system’s RAM. This way, when the system enterprise security products emulate, inspect, or quarantine shuts down or is restarted, there is no disk artifact of the such executables given that these files have no reputation. intrusion as state is lost in volatile memory when power is A month-long campaign effort can be rendered worthless on removed from the system. the first day of use if a Red Team’s artifact is immediately Python is a popular interpreted high-level programming lan- caught upon execution by the Hunt or IR teams. We offer guage [24] with an emphasis on readability and an abundance this information in the interest of raising awareness of the of reusable packages produced by the community. CPython is viability of this type of attack. As a simplification, we do the default and most widely used implementation of the Python not cover any additional security or obfuscation mechanisms language, written in C and Python. With the ctypes package, that a threat-actor would use, for example, changing bytecode CPython is as expressive as C, allowing non-interpreted native mappings or preventing memory dumps, nor do we cover how instruction execution. One common scheme unsophisticated to land onto the virtual address space of a process: These well- threat-actors use is to write their malware in Python and pack- known topics have been covered by other researchers [16], age the malware and all its dependencies into an executable [23]. Both Windows and POSIX implementations are in use in production. As a simplification, we only extensively cover the “freeze” [9] these Python C extensions and compile them into steps for Windows. However, similar steps apply to POSIX, the interpreter. However, this is not a flexible approach: You and we describe differences in Section V-E. must recompile the interpreter each time for a different set of Section II describes prior research relevant to this paper. “frozen” modules. Section III details the in-memory CPython interpreter. Sec- tion IV details loading Python scripts from memory into this C. Python module Loaders in-memory interpreter. Section V details in-memory dynamic The official stock CPython interpreter loads modules and loading of native Python C Extensions into this in-memory packages from files on disk using its PathFinder importer interpreter. Section VI covers emulation of a few security object. The interpreter also contains a frozen zipimport [20] techniques with this interpreter. Section VII covers future module that is limited to importing a single package or module work. from a ZIP file on disk. This zipimport cannot import Python C extensions, either. We constructed our cba_zipimport de- II. BACKGROUND rived from zipimport that can load multiple packages, modules, We cover prior work relevant to this paper in this section. and Python C extensions from an in-memory ZIP archive. Specifically, we cover related Python embeddings, dynamic loading techniques, and in-memory loading of Python mod- III. IN-MEMORY EMBEDDING OF PYTHON ules. We utilize the in-memory dynamic loading technique We cover the concept of embedding the Python interpreter derived from a prior work [22] in order to load native C in another program in this section. The Python interpreter Python Extensions into memory without “touching” the disk: is available as a shared library. We cover loading a shared This does not need to be reinvented. library from memory, and although this is prior art, we feel it is nonetheless important to summarize. Note that using A. Contemporary Python Embeddings an in-memory dynamic loading method [22] to load this Numerous malware have been publicized that use the Python interpreter shared library into memory is not enough Py2Exe or PyInstaller methods [7]. Signatures have been to initialize and start invoking Python scripts. We list the produced for these malware, and often security products additional necessary steps that lead to a successful initial- simply quarantine any Py2Exe or PyInstaller-generated ex- ization of the interpreter. We concluded that these were the ecutable without inspecting if the encapsulated payload is only steps necessary in an iterative fashion by running the malicious or benign. One malware was publicized in [23] CPython interpreter through a debugger observing failures, where the malware-authors recompiled the CPython interpreter reading the CPython source code, and reading official CPython and remapped the opcodes to make analysis difficult: In this documentation, in that order. case, this recompiled interpreter was still an executable file on disk. In [8], the authors take a peek inside Dropbox’s A. Embedding Python executable file and use a reflective loader [3] to inject their The Python interpreter was designed to be embedded shared library into the Dropbox process’s address space: into applications. Embedding allows an application developer The Dropbox executable file was also a Python interpreter. to implement some application functionality in the Python These listed cases differ from our approach: We show how programming language rather than in C or C++ [17]. An to download the interpreter into a location in memory and application initializes the embedded Python interpreter with then run the interpreter from there. Our approach does not a call to Py_Initialize. The interpreter can then be called “touch” the disk, meaning no intermediate step exists where from any part of the application. Simple and pure embeddings the interpreter or any dependency is saved as a file to disk in are covered in detail in [17]. The Python interpreter itself order for the modified CPython interpreter to function. is implemented in a shared library. For example, the Python 3.8 core shared library on Windows is python38.dll, and the B. Dynamic Loading main Python application (python.exe) loads this shared library The documented way to load a shared library at runtime and provides a read-eval-print loop (REPL) interface to the (also known as dynamic loading [13]) is to use functions user.
Recommended publications
  • Libffi This Manual Is for Libffi, a Portable Foreign-Function Interface Library
    Libffi This manual is for Libffi, a portable foreign-function interface library. Copyright c 2008, 2010, 2011 Red Hat, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. A copy of the license is included in the section entitled \GNU General Public License". Chapter 2: Using libffi 1 1 What is libffi? Compilers for high level languages generate code that follow certain conventions. These conventions are necessary, in part, for separate compilation to work. One such convention is the calling convention. The calling convention is a set of assumptions made by the compiler about where function arguments will be found on entry to a function. A calling convention also specifies where the return value for a function is found. The calling convention isalso sometimes called the ABI or Application Binary Interface. Some programs may not know at the time of compilation what arguments are to be passed to a function. For instance, an interpreter may be told at run-time about the number and types of arguments used to call a given function. `Libffi' can be used in such programs to provide a bridge from the interpreter program to compiled code. The `libffi' library provides a portable, high level programming interface to various calling conventions. This allows a programmer to call any function specified by a call interface description at run time. FFI stands for Foreign Function Interface. A foreign function interface is the popular name for the interface that allows code written in one language to call code written in another language.
    [Show full text]
  • Dell Wyse Management Suite Version 2.1 Third Party Licenses
    Dell Wyse Management Suite Version 2.1 Third Party Licenses October 2020 Rev. A01 Notes, cautions, and warnings NOTE: A NOTE indicates important information that helps you make better use of your product. CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the problem. WARNING: A WARNING indicates a potential for property damage, personal injury, or death. © 2020 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. Contents Chapter 1: Third party licenses...................................................................................................... 4 Contents 3 1 Third party licenses The table provides the details about third party licenses for Wyse Management Suite 2.1. Table 1. Third party licenses Component name License type jdk1.8.0_112 Oracle Binary Code License jre11.0.5 Oracle Binary Code License bootstrap-2.3.2 Apache License, Version 2.0 backbone-1.3.3 MIT MIT aopalliance-1.0.jar Public Domain aspectjweaver-1.7.2.jar Eclipse Public licenses- v 1.0 bcprov-jdk16-1.46.jar MIT commons-codec-1.9.jar Apache License, Version 2.0 commons-logging-1.1.1.jar Apache License, Version 2.0 hamcrest-core-1.3.jar BSD-3 Clause jackson-annotations.2.10.2.jar Apache License, Version 2.0 The Apache Software License, Version 2.0 jackson-core.2.10.2.jar Apache License, Version 2.0 The Apache Software License, Version 2.0 jackson-databind.2.10.2.jar Apache License, Version 2.0 The Apache Software License, Version 2.0 log4j-1.2.17.jar Apache License, Version 2.0 mosquitto-3.1 Eclipse Public licenses- v 1.0 Gradle Wrapper 2.14 Apache 2.0 License Gradle Wrapper 3.3 Apache 2.0 License HockeySDK-Ios3.7.0 MIT Relayrides / pushy - v0.9.3 MIT zlib-1.2.8 zlib license yaml-cpp-0.5.1 MIT libssl.dll (1.1.1c) Open SSL License 4 Third party licenses Table 1.
    [Show full text]
  • CFFI Documentation Release 0.8.6
    CFFI Documentation Release 0.8.6 Armin Rigo, Maciej Fijalkowski May 19, 2015 Contents 1 Installation and Status 3 1.1 Platform-specific instructions......................................4 2 Examples 5 2.1 Simple example (ABI level).......................................5 2.2 Real example (API level).........................................5 2.3 Struct/Array Example..........................................6 2.4 What actually happened?.........................................6 3 Distributing modules using CFFI7 3.1 Cleaning up the __pycache__ directory.................................7 4 Reference 9 4.1 Declaring types and functions......................................9 4.2 Loading libraries............................................. 10 4.3 The verification step........................................... 10 4.4 Working with pointers, structures and arrays.............................. 12 4.5 Python 3 support............................................. 14 4.6 An example of calling a main-like thing................................. 15 4.7 Function calls............................................... 15 4.8 Variadic function calls.......................................... 16 4.9 Callbacks................................................. 17 4.10 Misc methods on ffi........................................... 18 4.11 Unimplemented features......................................... 20 4.12 Debugging dlopen’ed C libraries..................................... 20 4.13 Reference: conversions.......................................... 21 4.14 Reference:
    [Show full text]
  • Compiler Fuzzing: How Much Does It Matter?
    Compiler Fuzzing: How Much Does It Matter? MICHAËL MARCOZZI∗, QIYI TANG∗, ALASTAIR F. DONALDSON, and CRISTIAN CADAR, Imperial College London, United Kingdom Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether we can manually find application inputs that trigger execution divergence due to such changes. Thestudy is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian 155 packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate tothe generated binaries for at least one package, in which case only a very small part of the binary is typically affected, yet causing two failures when running the test suites of all the impacted packages. User-reported and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs andonetest failure.
    [Show full text]
  • VSI's Open Source Strategy
    VSI's Open Source Strategy Plans and schemes for Open Source so9ware on OpenVMS Bre% Cameron / Camiel Vanderhoeven April 2016 AGENDA • Programming languages • Cloud • Integraon technologies • UNIX compability • Databases • Analy;cs • Web • Add-ons • Libraries/u;li;es • Other consideraons • SoDware development • Summary/conclusions tools • Quesons Programming languages • Scrip;ng languages – Lua – Perl (probably in reasonable shape) – Tcl – Python – Ruby – PHP – JavaScript (Node.js and friends) – Also need to consider tools and packages commonly used with these languages • Interpreted languages – Scala (JVM) – Clojure (JVM) – Erlang (poten;ally a good fit with OpenVMS; can get good support from ESL) – All the above are seeing increased adop;on 3 Programming languages • Compiled languages – Go (seeing rapid adop;on) – Rust (relavely new) – Apple Swi • Prerequisites (not all are required in all cases) – LLVM backend – Tweaks to OpenVMS C and C++ compilers – Support for latest language standards (C++) – Support for some GNU C/C++ extensions – Updates to OpenVMS C RTL and threads library 4 Programming languages 1. JavaScript 2. Java 3. PHP 4. Python 5. C# 6. C++ 7. Ruby 8. CSS 9. C 10. Objective-C 11. Perl 12. Shell 13. R 14. Scala 15. Go 16. Haskell 17. Matlab 18. Swift 19. Clojure 20. Groovy 21. Visual Basic 5 See h%p://redmonk.com/sogrady/2015/07/01/language-rankings-6-15/ Programming languages Growing programming languages, June 2015 Steve O’Grady published another edi;on of his great popularity study on programming languages: RedMonk Programming Language Rankings: June 2015. As usual, it is a very valuable piece. There are many take-away from this research.
    [Show full text]
  • Structured Foreign Types
    Ftypes: Structured foreign types Andrew W. Keep R. Kent Dybvig Indiana University fakeep,[email protected] Abstract When accessing scalar elements within an ftype, the value of the High-level programming languages, like Scheme, typically repre- scalar is automatically marshaled into the equivalent Scheme rep- sent data in ways that differ from the host platform to support resentation. When setting scalar elements, the Scheme value is consistent behavior across platforms and automatic storage man- checked to ensure compatibility with the specified foreign type, and agement, i.e., garbage collection. While crucial to the program- then marshaled into the equivalent foreign representation. Ftypes ming model, differences in data representation can complicate in- are well integrated into the system, with compiler support for ef- teraction with foreign programs and libraries that employ machine- ficient access to foreign data and debugger support for convenient dependent data structures that do not readily support garbage col- inspection of foreign data. lection. To bridge this gap, many high-level languages feature for- The ftype syntax is convenient and flexible. While it is similar in eign function interfaces that include some ability to interact with some ways to foreign data declarations in other Scheme implemen- foreign data, though they often do not provide complete control tations, and language design is largely a matter of taste, we believe over the structure of foreign data, are not always fully integrated our syntax is cleaner and more intuitive than most. Our system also into the language and run-time system, and are often not as effi- has a more complete set of features, covering all C data types along cient as possible.
    [Show full text]
  • Overview Guide Release 21B F45060-01
    Oracle Utilities Customer Care and Billing Cloud Service Overview Guide Release 21B F45060-01 August 2021 Oracle Utilities Customer Care and Billing Cloud Service Release 21B Overview Guide Copyright © 2012, 2021 Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation"
    [Show full text]
  • Shiny Server Administrator's Guide
    Shiny Server Administrator’s Guide Shiny Server Professional v1.5.5 Copyright © 2017 RStudio, Inc. Contents 1 Getting Started 8 1.1 Introduction.........................................8 1.2 System Requirements...................................9 1.3 Installation.........................................9 1.3.1 Ubuntu (12.04+)..................................9 1.3.2 RedHat/CentOS (5.4+).............................. 10 1.3.3 SUSE Linux Enterprise Server (11+)....................... 11 1.3.4 Install Shiny.................................... 12 1.3.5 R Installation Location.............................. 13 1.4 Stopping and Starting................................... 14 1.4.1 systemd (RedHat 7, Ubuntu 15.04+, SLES 12+)................ 14 1.4.2 Upstart (Ubuntu 12.04 through 14.10, RedHat 6)................ 15 1.4.3 init.d (RedHat 5, SLES 11)............................ 16 2 Server Management 17 2.1 Default Configuration................................... 17 2.2 Server Hierarchy...................................... 18 2.2.1 Server........................................ 18 2.2.2 Location....................................... 19 2.2.3 Application..................................... 20 2.3 run_as ............................................ 20 2.3.1 :HOME_USER: .................................... 21 2.3.2 Running Shiny Server with Root Privileges................... 22 2.3.3 :AUTH_USER: .................................... 23 2.4 PAM Sessions........................................ 25 2.4.1 Session Profile..................................
    [Show full text]
  • Foreign Library Interface by Daniel Adler Dia Applications That Can Run on a Multitude of Plat- Forms
    30 CONTRIBUTED RESEARCH ARTICLES Foreign Library Interface by Daniel Adler dia applications that can run on a multitude of plat- forms. Abstract We present an improved Foreign Function Interface (FFI) for R to call arbitary na- tive functions without the need for C wrapper Foreign function interfaces code. Further we discuss a dynamic linkage framework for binding standard C libraries to FFIs provide the backbone of a language to inter- R across platforms using a universal type infor- face with foreign code. Depending on the design of mation format. The package rdyncall comprises this service, it can largely unburden developers from the framework and an initial repository of cross- writing additional wrapper code. In this section, we platform bindings for standard libraries such as compare the built-in R FFI with that provided by (legacy and modern) OpenGL, the family of SDL rdyncall. We use a simple example that sketches the libraries and Expat. The package enables system- different work flow paths for making an R binding to level programming using the R language; sam- a function from a foreign C library. ple applications are given in the article. We out- line the underlying automation tool-chain that extracts cross-platform bindings from C headers, FFI of base R making the repository extendable and open for Suppose that we wish to invoke the C function sqrt library developers. of the Standard C Math library. The function is de- clared as follows in C: Introduction double sqrt(double x); We present an improved Foreign Function Interface The .C function from the base R FFI offers a call (FFI) for R that significantly reduces the amount of gate to C code with very strict conversion rules, and C wrapper code needed to interface with C.
    [Show full text]
  • Poliastro: an Astrodynamics Library Written in Python with Fortran Performance
    POLIASTRO: AN ASTRODYNAMICS LIBRARY WRITTEN IN PYTHON WITH FORTRAN PERFORMANCE Juan Luis Cano Rodríguez Helge Eichhorn Frazer McLean Universidad Politécnica de Madrid Technische Universität Darmstadt CS GmbH Madrid, Spain Darmstadt, Germany Darmstadt, Germany ABSTRACT However, the current situation of scientific codes is still delicate: most scientists and engineers that write numerical Python is a fast-growing language both for astronomic appli- software (which often features a strong algorithmic com- cations and for educational purposes, but it is often criticized ponent and has tight performance requirements) usually do for its suboptimal performance and lack of type enforcement. not have any formal training on computer programming, let In this paper we present poliastro, a pure Python library for alone software engineering best practices[3]. In fact, in the Astrodynamics that overcomes these obstacles and serves as Aerospace industry there is a sad track record of software a proof of concept of Python strengths and its suitability to failures[4, 5] that could have been avoided by following model complex systems and implement fast algorithms. better software and systems engineering practices. poliastro features core Astrodynamics algorithms (such as resolution of the Kepler and Lambert problems) written When selecting a certain programming language for a spe- in pure Python and compiled using numba, a modern just- cific problem, we as engineers have the obligation to consider in-time Python-to-LLVM compiler. As a result, preliminary as much information as possible and make an informed de- benchmarks suggest a performance increase close to the ref- cision based on technical grounds. For example, if defect erence Fortran implementation, with negligible impact in the density were to be selected as the single figure to rank the legibility of the Python source.
    [Show full text]
  • Licensing Information User Manual Mysql Enterprise Monitor 3.4
    Licensing Information User Manual MySQL Enterprise Monitor 3.4 Table of Contents Licensing Information .......................................................................................................................... 4 Licenses for Third-Party Components .................................................................................................. 5 Ant-Contrib ............................................................................................................................... 10 ANTLR 2 .................................................................................................................................. 11 ANTLR 3 .................................................................................................................................. 11 Apache Commons BeanUtils v1.6 ............................................................................................. 12 Apache Commons BeanUtils v1.7.0 and Later ........................................................................... 13 Apache Commons Chain .......................................................................................................... 13 Apache Commons Codec ......................................................................................................... 13 Apache Commons Collections .................................................................................................. 14 Apache Commons Daemon ...................................................................................................... 14 Apache
    [Show full text]
  • Release-1.11
    CFFI Documentation Release 1.11.5 Armin Rigo, Maciej Fijalkowski Feb 27, 2018 Contents 1 Goals 3 2 Comments and bugs 5 3 What’s New 7 3.1 v1.11.5..................................................7 3.2 v1.11.4..................................................7 3.3 v1.11.3..................................................8 3.4 v1.11.2..................................................8 3.5 v1.11.1..................................................8 3.6 v1.11...................................................8 3.7 Older Versions..............................................9 4 Installation and Status 17 4.1 Platform-specific instructions...................................... 18 5 Overview 21 5.1 Simple example (ABI level, in-line)................................... 21 5.2 Real example (API level, out-of-line).................................. 22 5.3 Struct/Array Example (minimal, in-line)................................ 23 5.4 Purely for performance (API level, out-of-line)............................. 24 5.5 Out-of-line, ABI level.......................................... 25 5.6 Embedding................................................ 25 5.7 What actually happened?......................................... 26 5.8 ABI versus API.............................................. 27 6 Using the ffi/lib objects 29 6.1 Working with pointers, structures and arrays.............................. 30 6.2 Python 3 support............................................. 33 6.3 An example of calling a main-like thing................................. 33 6.4 Function
    [Show full text]