Masarykova univerzita Fakulta}w¡¢£¤¥¦§¨  informatiky !"#$%&'()+,-./012345

Analysis of Malware Classification Schemas

Master’s Thesis

Bc. Peter Nemček

Brno, Fall 2013

Declaration

Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Bc. Peter Nemček

Advisor: Mgr. Vít Bukač

iii

Acknowledgement

I would like to thank my parents, who supported me during the whole stud- ies, my girlfriend, who greatly supported me during the whole studies and in writing of this thesis and I would like to thank my thesis advisor, Mgr. Vít Bukač, for his valuable suggestions and comments about this thesis.

v

Abstract

The aim of this thesis is to analyze and compare properties and behavior of various malware classification schemas (YARA, OpenIOC, MAEC/Mitre). In the theoretical chapter, each schema will be described and there will be a comparison between these schemas, and also strengths and weaknesses of each schema. In the practical chapter, there will be a proposal of a conversion tool in the form of web service which will implement some basic functionality of this tool. The thesis will also propose further research directions.

vii

Keywords

Malware, Malware Analysis, Malware Signatures, YARA, OpenIOC, MAEC, Malware Signature Conversion, APT

ix

Contents

1 Introduction ...... 5 1.1 Motivation – APT1 ...... 6 Key Findings ...... 7 2 Malware Types and Properties ...... 11 2.1 Behavior ...... 12 2.1.1 Downloaders and Launchers ...... 12 2.1.2 Backdoors ...... 13 Reverse Shell ...... 14 Remote Administration Tools ...... 15 Botnets ...... 15 2.1.3 Credential Stealers ...... 16 GINA Interception ...... 16 Hash Dumping ...... 16 Keystroke Logging ...... 17 2.1.4 Maintaining persistence ...... 18 The Windows Registry ...... 18 Trojanized System Binaries ...... 20 DLL Load-Order Hijacking ...... 21 2.1.5 Privilege Escalation ...... 22 Using SeDebugPrivilege ...... 22 2.1.6 User-Mode Rootkits ...... 23 IAT Hooking ...... 23 Inline Hooking ...... 23 2.2 Launching Malware Silently ...... 24 2.2.1 Launchers ...... 24 2.2.2 Process Injection ...... 24 DLL Injection ...... 25 Direct Injection ...... 25 2.2.3 Process Replacement ...... 26 2.2.4 Hook Injection ...... 27 Local and Remote Hooks ...... 27 Keyloggers Using Hooks ...... 27 Using SetWindowsHookEx ...... 28 Thread Targeting ...... 28 2.2.5 Detours ...... 28 2.2.6 APC Injection ...... 29 APC Injection from User Space ...... 30

1 APC Injection from Kernel Space ...... 30 3 Leaving Footprints ...... 31 3.1 Signatures ...... 31 3.2 Tools for Effective Signature Creation ...... 32 3.2.1 Staying Anonymous ...... 32 3.2.2 Creating a Secure and Controlled Environment . . . . 34 Tools Than May Come Handy ...... 34 Virtual Machines ...... 36 4 Signature Formats ...... 37 4.1 Mandiant’s OpenIOC ...... 37 4.1.1 OpenIOC ...... 37 4.1.2 IOC Functionality ...... 38 4.1.3 Using IOC in the Investigative Lifecycle ...... 39 4.2 YARA ...... 41 4.2.1 Creating Rules ...... 41 Strings ...... 43 Conditions ...... 43 4.2.2 Release of Version 2.0 ...... 45 4.2.3 Advantages of YARA format ...... 46 4.3 MAEC ...... 47 4.3.1 MAEC Language ...... 47 Low Level – Abstracted Actions ...... 48 Mid Level – Behaviors ...... 48 High Level – Mechanisms ...... 49 Example Mapping ...... 49 The MAEC Bundle Output Format ...... 50 The MAEC Package Output Format ...... 51 4.3.2 High Level Use Cases for the MAEC Language . . . . 53 4.3.3 Advantages of MAEC format ...... 54 4.3.4 Disadvantages of MAEC format ...... 55 4.4 Comparison of Mentioned Formats ...... 55 5 A Tool for Malware Signature Conversion ...... 57 5.1 Requirements ...... 57 5.2 Technology Used ...... 57 5.2.1 Java ...... 57 5.2.2 Spring Framework ...... 57 5.2.3 Server ...... 58 5.3 The Tool ...... 58 5.4 Internal Format ...... 59 5.4.1 Workflow ...... 59

2 5.5 How to Run the Tool ...... 60 Running Using a Batch Script ...... 60 Running Via Maven ...... 60 Running Using a Server ...... 60 5.6 A Simple Use Case ...... 61 6 Conclusion ...... 65 A Contents of the Attached CD ...... 69 B Mandiant’s OpenIOC signature format ...... 71 C YARA signature format ...... 73 D MAEC signature format ...... 75

3

1 Introduction

It is hard to imagine a world in which everything is perfect. Nobody threatens anyone and everybody is happy in this beautiful and secure world. Unfortu- nately for us it is indeed hard due to the empirical experience we face each day. And now let’s add the fact that almost everything is already (or is being) converted to electronic form. Every single piece of activity has some connec- tion to electronic devices or electronic way of communication. Let is be a simple task like walking through the city with your smartphone connected to the internet or a more sophisticated one (e.g., buying something through Paypal). In the former example someone might be interested in where you currently are but that someone would be in a very small minority. However, in the latter one, virtually every single bad entity is interested in money. Un- fortunately, money make the world go around and it takes a large amount of money to stop losing even larger sum of money. According to [1], spending in the field of cyber security were the only area where they were majorly increased. As the years go by, attacks are not so undirected as they used to be. There are specialized people, even military groups (as documented in the recent APT1 threat) who attack in a most subtle way trying to hide from every possible defense mechanism and target their victim with most precise strikes. Leaving no traces in essential for the attackers as they want to gather as much information (and of course, money) as possible. The APT1 threat was employed by the Chinese government trying to steal vital information from U. S. military organizations. The threat is not present only to military organizations, it is present to any company that might have an intellectual property that is valuable. It is crucial to be able to thwart and identify any attacks that are employed by evil entities in order to protect ourselves from data and/or money theft.

5 1. Introduction 1.1 Motivation – APT1

Since 2004, Mandiant1 has investigated computer security breaches at hun- dreds of organizations around the world. The majority of these security breaches are attributed to advanced threat actors referred to as the ‘Ad- vanced Persistent Threat’ (APT). Mandiant first published details about the APT in their January 2010 M-Trends report. As they stated in the report, their position was that ‘The Chinese government may authorize this activity, but there’s no way to determine the extent of its involvement.’ Now, three years later, they have the evidence required to change their assessment. The details they have analyzed during hundreds of investigations convince them that the groups conducting these activities are based primarily in China and that the Chinese Government is aware of them.[2] Mandiant continues to track dozens of APT groups around the world; however, the APT1 report [2] is focused on the most prolific of these groups. They refer to this group as ‘APT1’ and it is one of more than 20 APT groups with origins in China. APT1 is a single organization of operators that has conducted a cyber espionage campaign against a broad range of victims since at least 2006. From their observations, it is one of the most prolific cyber espionage groups in terms of the sheer quantity of information stolen. The scale and impact of APT1’s operations compelled them to write the report.[2] The activity they have directly observed likely represents only a small fraction of the cyber espionage that APT1 has conducted. Though their vis- ibility of APT1’s activities is incomplete, they have analyzed the group’s intrusions against nearly 150 victims over seven years. From their unique vantage point responding to victims, they tracked APT1 back to four large networks in Shanghai, two of which are allocated directly to the Pudong New Area. They uncovered a substantial amount of APT1’s attack infras- tructure, command and control, and modus operandi (tools, tactics, and procedures). In an effort to underscore there are actual individuals behind the keyboard, Mandiant revealed three personas they have attributed to APT1. These operators, like soldiers, may merely be following orders given to them by others.[2] Their analysis has led them to conclude that APT1 is likely government- sponsored and one of the most persistent of China’s cyber threat actors. They believe that APT1 is able to wage such a long-running and extensive cyber espionage campaign in large part because it receives direct government

1. http://www.mandiant.com/

6 1. Introduction support. In seeking to identify the organization behind this activity, their research found that People’s Liberation Army (PLA’s) Unit 61398 is similar to APT1 in its mission, capabilities, and resources. PLA Unit 61398 is also located in precisely the same area from which APT1 activity appears to originate.[2]

Key Findings

Location APT1 is believed to be the 2nd Bureau of the People’s Liberation Army (PLA) General Staff Department’s (GSD) 3rd Department, which is most commonly known by its Military Unit Cover Designator (MUCD) as Unit 61398. The Unit 61398 is believed to be staffed by hundreds or thousands of people based on the size of its physical infrastructure. It resides in a recently built (2007) 12-stories-high building and it has a special fiber optic connection directly from China Telecom. The personnel is required to be trained in computer security and operations and is required to be proficient in the English language.[2]

Nature of Stolen Data APT1 has systematically stolen hundreds of ter- abytes of data from at least 141 organizations, and has demonstrated the capability and intent to steal from dozens of organizations simultaneously. APT1 has a well-defined attack methodology, honed over years and designed to steal large volumes of valuable intellectual property. Once APT1 has es- tablished access, they periodically revisit the victim’s network over several months or years and steal broad categories of intellectual property, includ- ing technology blueprints, proprietary manufacturing processes, test results, business plans, pricing documents, partnership agreements, and emails and contact lists from victim organizations’ leadership. APT1 uses some tools and techniques that Mandiant has not yet observed being used by other groups including two utilities designed to steal email – GETMAIL and MAPIGET. Among other large-scale thefts of intellectual property, they have observed APT1 stealing 6.5 terabytes of compressed data from a single organization over a ten-month time period.[2]

Targets Of the 141 APT1 victims, 87% of them are headquartered in countries where English is the native language. The industries APT1 targets match industries that China has identified as strategic to their growth, in- cluding four of the seven strategic emerging industries that China identified in its 12th Five-Year Plan.[2]

7 1. Introduction

Infrastructure APT1 controls thousands of systems in support of their computer intrusion activities. In the last two years 2 Mandiant has observed APT1 establish a minimum of 937 Command and Control (C2) servers hosted on 849 distinct IP addresses in 13 countries. The majority of these 849 unique IP addresses were registered to organizations in China (709), fol- lowed by the U.S. (109). Over a two-year period (January 2011 to January 2013) Mandiant confirmed 1,905 instances of APT1 actors logging into their attack infrastructure from 832 different IP addresses with Remote Desktop, a tool that provides a remote user with an interactive graphical interface to a system. In the last several years Mandiant has confirmed 2,551 fully qualified domain names (FQDNs) attributed to APT1.[2]

Attacks’ Origin In 1,849 of the 1,905 (97%) of the Remote Desktop sessions APT1 conducted under Mandiant’s observation, the APT1 opera- tor’s keyboard layout setting was ‘Chinese (Simplified) – US Keyboard’. Mi- crosoft’s Remote Desktop client configures this setting automatically based on the selected language on the client system. Therefore, the APT1 attackers likely have their configured to display Simpli- fied Chinese fonts. 817 of the 832 (98%) IP addresses logging into APT1 con- trolled systems using Remote Desktop resolved back to China. Mandiant ob- served 767 separate instances in which APT1 intruders used the ‘HUC Packet Transmit Tool’ or HTRAN to communicate between 614 distinct routable IP addresses and their victims’ systems using their attack infrastructure. Of the 614 distinct IP addresses used for HTRAN communications:[2]

• 614 of 614 (100%) were registered in China.

• 613 (99.8%) were registered to one of four Shanghai net blocks.

Staff The size of APT1’s infrastructure implies a large organization with at least dozens, but potentially hundreds of human operators. Mandiant conservatively estimates that APT1’s current attack infrastructure includes over 1,000 servers. Given the volume, duration and type of attack activity we have observed, APT1 operators would need to be directly supported by linguists, open source researchers, malware authors, industry experts who translate task requests from requestors to the operators, and people who then transmit stolen information to the requestors. APT1 would also need a sizable IT staff dedicated to acquiring and maintaining computer equip-

2. February 2013

8 1. Introduction ment, people who handle finances, facility management, and logistics (e.g., shipping).[2]

Protection Mandiant released more than 3,000 indicators to bolster de- fenses against APT1 operations, more specifically over 3,000 APT1 indica- tors, such as domain names, IP addresses, and MD5 hashes of malware. Also sample Indicators of Compromise (IOCs) and detailed descriptions of over 40 families of malware in APT1’s arsenal of digital weapons. Last but not least, they released thirteen X.509 encryption certificates used by APT1.[2]

9

2 Malware Types and Properties

It’s best to get known to groups of malware and classify them by their behavior. According to [3], there are various groups of malware. These groups (plus one recent at the end) are:

• Backdoor – Malicious code that installs itself onto a computer to al- low the attacker an easy and hidden access requiring no or a very little authentication and it allows an execution of the attacker’s commands. • Botnet – Similarly as backdoor it allows to connect to the victim’s computer and it forms a logical network of computers that are able to perform various attacks (such as DDoS3 or sending spam) which can leverage an access to the victim’s computer for a malicious use by the attacker. • Downloader – Malicious code which only purpose is to download other piece of malicious files. This malware group is typically the first malware installed (which is logical as it only performs hardly any malicious activity per se) • Information-stealing malware – Malware that collects informa- tion from a victim’s computer and usually sends it to the attacker. Examples include sniffers, password hash grabbers, and keyloggers. This malware is typically used to gain access to online accounts such as email or online banking.[3] • Launcher – Malicious program used to launch other malicious pro- grams. They typically use various uncommon techniques in order to hide themselves and/or escalate their system privileges. • Rootkit – Code whose job is to hide the presence of other malware. Rooting is process of changing default system functions in order to prevent some action which would be bad for the malware4. It also can damage various industrial systems and production by changing it’s behavior and changing parameters. • Scareware – Malware designed to frighten an infected user into buy- ing something. It usually has a user interface that makes it look like

3. Distributed Denial-of-Service 4. For example, it could be a folder enumeration prevention, preventing malicious file deletion etc.

11 2. Malware Types and Properties

an antivirus or other security program. It informs users that there is malicious code on their system and that the only way to get rid of it is to buy their ‘software’, when in reality, the software it’s selling does nothing more than remove the scareware. [3]

• Spam-sending malware – Malware that infects a user’s machine and then uses that machine to send spam. This malware generates income for attackers by allowing them to sell spam-sending services.

• Worm or virus – Malicious code that can copy itself and infect ad- ditional computers. There are also specialized cases where only local networks are targeted.5

• Ransomware – Malware that either encrypts or pretends to encrypt important files of users – such as documents, photos, videos etc. – and demands ransom money. If it only pretends to encrypt files it can be removed fairly easy and users’ documents are intact, but if it actually encrypts the files with remote private key (so it is not exposed at any time, it only exists at a remote host) the removal may be easy, but user is still left with his files encrypted. Users are often under some kind of threat indicated by countdown, so they are often psychologically forced to pay the ransom. Infection may not be apparent at the first glance since it demands money only when the encryption is completed. The best protection against this is to back up data regularly and never pay the attackers.

2.1 Behavior

Further breakdown of malware types deals with their behavioral character- istics. These typical signs and descriptions are extracted from [3].

2.1.1 Downloaders and Launchers Two commonly encountered types of malware are downloaders and launch- ers. Downloaders simply download another piece of malware from the Inter- net and execute it on the local system. Downloaders are often packaged with an exploit. Downloaders commonly use the Windows API, such as

5. A recent example would be Stuxnet (June 2010) which is believed to be designed to attack Iran’s nuclear facilities.

12 2. Malware Types and Properties

URLDownloadtoFileA, followed by a call to WinExec to download and ex- ecute new malware. [3] A launcher (also known as a loader) is any executable that installs mal- ware for immediate or future covert execution. Launchers often contain the malware that they are designed to load. Launchers often contain the mal- ware that they’re designed to load. The most common example is an ex- ecutable or DLL in its own resource section. The resource section in the Windows PE file format is used by the executable and is not considered part of the executable. Examples of the normal contents of the resource sec- tion include icons, images, menus, and strings. Launchers will often store malware within the resource section. When the launcher is run, it extracts an embedded executable or DLL from the resource section before launch- ing it. If the resource section is compressed or encrypted, the malware must perform resource section extraction before loading. This often means that you will see the launcher use resource-manipulation API functions such as FindResource, LoadResource, and SizeofResource. Malware launchers of- ten must be run with administrator privileges or escalate themselves to have those privileges. Average user processes can’t perform all of the techniques stated here and therefore it must employ a privilege escalation. The fact that launchers may contain privilege-escalation code provides another way to identify them. [3]

2.1.2 Backdoors

A backdoor is a type of malware that provides an attacker with remote access to a victim’s machine. Backdoors are the most commonly found type of malware, and they come in all shapes and sizes with a wide variety of capabilities. Backdoor code often implements a full set of capabilities, so when using a backdoor attackers typically don’t need to download additional malware or code. Backdoors communicate over the Internet in numerous ways, but a common method is over port 80 using the HTTP protocol. HTTP is the most commonly used protocol for outgoing network traffic, so it offers malware the best chance to blend in with the rest of the traffic. Backdoors come with a common set of functionality, such as the ability to manipulate registry keys, enumerate display windows, create directories, search files, and so on. You can determine which of these features is implemented by a backdoor by looking at the Windows functions it uses and imports. [3]

13 2. Malware Types and Properties

Reverse Shell A reverse shell is a connection that originates from an infected machine and provides attackers shell access to that machine. Reverse shells are found as both stand-alone malware and as components of more sophisticated back- doors. Once in a reverse shell, attackers can execute commands as if they were on the local system. [3]

Netcat Reverse Shells Netcat can be used to create a reverse shell by running it on two machines. Attackers have been known to use Netcat or package Netcat within other malware. When Netcat is used as a reverse shell, the remote machine waits for incoming connections using nc -l –p 80. Next, the victim machine connects out and provides the shell using nc listener_ip 80 -e cmd.exe. The listener_ip 80 parts are the IP ad- dress and port on the remote machine. The -e option is used to designate a program to execute once the connection is established, tying the standard input and output from the program to the socket (on Windows, cmd.exe is often used, as discussed next). [3]

Windows Reverse Shells Attackers employ two simple malware cod- ing implementations for reverse shells on Windows using cmd.exe: basic and multithreaded. The basic method is popular among malware authors, since it’s easier to write and generally works just as well as the multithreaded technique. It involves a call to CreateProcess and the manipulation of the STARTUPINFO structure that is passed to CreateProcess. First, a socket is created and a connection to a remote server is established. That socket is then tied to the standard streams (standard input, standard output, and standard error) for cmd.exe. CreateProcess runs cmd.exe with its window suppressed, to hide it from the victim. [3] The multithreaded version of a Windows reverse shell involves the cre- ation of a socket, two pipes, and two threads (so it’s best to look for API calls to CreateThread and CreatePipe). This method is sometimes used by mal- ware authors as part of a strategy to manipulate or encode the data coming in or going out over the socket. CreatePipe can be used to tie together read and write ends to a pipe, such as standard input (stdin) and standard out- put (stdout). The CreateProcess method can be used to tie the standard streams to pipes instead of directly to the sockets. After CreateProcess is called, the malware will spawn two threads: one for reading from the stdin pipe and writing to the socket, and the other for reading the socket and writ-

14 2. Malware Types and Properties ing to the stdout pipe. Commonly, these threads manipulate the data using data encoding. Reverse-engineering the encoding/decoding routines used by the threads can be performed to decode packet captures containing encoded sessions. [3]

Remote Administration Tools

A remote administration tool (RAT) is used to remotely manage a computer or computers. RATs are often used in targeted attacks with specific goals, such as stealing information or moving laterally across a network. Figure 2.1 shows the RAT network structure. The server is running on a victim host implanted with malware. The client is running remotely as the command and control unit operated by the attacker. The servers beacon to the client to start a connection, and they are controlled by the client. RAT communication is typically over common ports like 80 and 443. [3]

Figure 2.1: RAT Network Structure, source: [3]

Botnets

A botnet is a collection of compromised hosts, known as zombies, that are controlled by a single entity, usually through the use of a server known as a botnet controller. The goal of a botnet is to compromise as many hosts as possible in order to create a large network of zombies that the botnet uses to spread additional malware or spam, or perform a distributed denial-of- service (DDoS) attack. Botnets can take a website offline by having all of the zombies attack the website at the same time. [3]

15 2. Malware Types and Properties

2.1.3 Credential Stealers Attackers often go to great lengths to steal credentials, primarily with three types of malware: [3] • Programs that wait for a user to log in in order to steal their creden- tials • Programs that dump information stored in Windows, such as pass- word hashes, to be used directly or cracked offline • Programs that log keystrokes

GINA Interception On Windows XP, Microsoft’s Graphical Identification and Authentication (GINA) interception is a technique that malware uses to steal user creden- tials. The GINA system was intended to allow legitimate third parties to customize the logon process by adding support for things like authentication with hardware radio-frequency identification (RFID) tokens or smart cards. Malware authors take advantage of this third-party support to load their credential stealers. GINA is implemented in a DLL, msgina.dll, and is loaded by the Winl- ogon executable during the login process. Winlogon also works for thirdparty customizations implemented in DLLs by loading them in between Winlogon and the GINA DLL (like a man-in-the-middle attack). Windows conveniently provides the following registry location where third-party DLLs will be found and loaded by Winlogon:

HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\GinaDLL

Because fsgina.dll intercepts the communication between Winlogon and msgina.dll, it must pass the credential information on to msgina.dll so that the system will continue to operate normally. In order to do so, the malware must contain all DLL exports required by GINA; specifically, it must export more than 15 functions, most of which are prepended with Wlx. Clearly, if a suspicious DLL contains many export functions that begin with the string Wlx, it is a good indicator that this DLL is a GINA interceptor. [3]

Hash Dumping Dumping Windows hashes is a popular way for malware to access system credentials. Attackers try to grab these hashes in order to crack them of-

16 2. Malware Types and Properties

fline or to use them in a pass-the-hash attack. A pass-the-hash attack uses LM6 and NTLM7 hashes to authenticate to a remote host (using NTLM au- thentication) without needing to decrypt or crack the hashes to obtain the plaintext password to log in. Pwdump and the Pass-the-Hash (PSH) Toolkit are freely available packages that provide hash dumping. Since both of these tools are open source, a lot of malware is derived from their source code. Most antivirus programs have signatures for the default compiled versions of these tools, so attackers often try to compile their own versions in or- der to avoid detection. Pwdump is a set of programs that outputs the LM and NTLM password hashes of local user accounts from the Security Ac- count Manager (SAM). Pwdump works by performing DLL injection inside the Local Security Authority Subsystem Service (LSASS) process (better known as lsass.exe). This way malware can run a DLL inside another pro- cess, thereby providing that DLL with all of the privileges of that process. Hash dumping tools often target lsass.exe because it has the necessary privilege level as well as access to many useful API functions. [3]

Keystroke Logging Keylogging is a classic form of credential stealing. When keylogging, malware records keystrokes so that an attacker can observe typed data like usernames and passwords. Windows malware uses many forms of keylogging. [3]

Kernel-Based Keyloggers Kernel-based keyloggers are difficult to detect with user-mode applications. They are frequently part of a rootkit and they can act as keyboard drivers to capture keystrokes, bypassing user- space programs and protections. [3]

User-Space Keyloggers Windows user-space keyloggers typically use the Windows API and are usually implemented with either hooking or polling. Hooking uses the Windows API to notify the malware each time a key is pressed, typically with the SetWindowsHookEx function. Polling uses the Windows API to constantly poll the state of the keys, typically using the GetAsyncKeyState and GetForegroundWindow functions. Hooking key- loggers leverage the Windows API function SetWindowsHookEx. This type of keylogger may come packaged as an executable that initiates the hook function, and may include a DLL file to handle logging that can be mapped into many processes on the system automatically. [3]

6. LAN Manager hash 7. NT LAN Manager

17 2. Malware Types and Properties

Identifying Keyloggers in Strings Listings User can spot keylog- ger functionality in malware by looking at the imports for the API func- tions, or by examining the strings listing for indicators. The strings listing is particularly useful if the imports are obfuscated or the malware is using keylogging functionality that user has not encountered before. Of course, even strings can be obfuscated, so additional malware analysis techniques will be required. For example, the following listing of strings is taken from one keylogger sample: [3]

[Up] [Num Lock] [Down] [Right] [UP] [Left] [PageDown]

If a keylogger wants to log all keystrokes, it must have a way to print keys like PAGE DOWN, and must have access to these strings. Working backward from the cross-references to these strings can be a way to recognize keylogging functionality in malware. [3]

2.1.4 Maintaining persistence After gaining access to a system, the malware wants to be there for a long time. This is a technique also known as persistence. If it is unique enough, it can serve as a great way to fingerprint a given piece of malware. In order to do some malicious stuff, malware has to be present on victim’s system. It has to receive commands and updates, send private information to an attacker. The attacker can make it dormant, making it harder to detect while idle. In most cases it is present at the hard-disk drive when running. [3]

The Windows Registry It is common for malware to access the registry to store configuration infor- mation, gather information about the system, and install itself persistently. A popular place for malware to install itself on the victim’s machine is the registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run

18 2. Malware Types and Properties

There are many other persistence locations in the registry that can be enumerated by various tools like the Autoruns program by Sysinternals which points the user to all the programs that automatically run on his system. There are a couple popular registry entries that are worth expanding on further: AppInit_DLLs, Winlogon, and SvcHost DLLs. [3]

AppInit_DLLs Malware authors can gain persistence for their DLLs though a special registry location called AppInit_DLL. AppInit_DLLs are loaded into every process that loads User32.dll, and a simple insertion into the registry will make AppInit_DLLs persistent. The AppInit_DLLs value is stored in the following Windows registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows

The AppInit_DLLs value is of type REG_SZ and consists of a space- delimited string of DLLs. Most processes load User32.dll, and all of those processes also load the AppInit_DLLs. Malware authors often target indi- vidual processes, but AppInit_DLLs will be loaded into many processes. Therefore, malware authors must check to see in which process the DLL is running before executing their payload. This check is often performed in DllMain of the malicious DLL. [3]

Winlogon Notify Malware authors can hook malware to a particular Winlogon event, such as logon, logoff, startup, shutdown, and lock screen. This can even allow the malware to load in safe mode. The registry entry consists of the Notify value in the following registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\

When winlogon.exe generates an event, Windows checks the Notify reg- istry key for a DLL that will handle it. [3]

SvcHost DLLs All services persist in the registry, and if they’re re- moved from the registry, the service won’t start. Malware is often installed as a Windows service, but typically uses an executable. Installing malware for persistence as an svchost.exe DLL makes the malware blend into the process list and the registry better than a standard service. Svchost.exe is a generic host process for services that run from DLLs, and Windows systems often have many instances of svchost.exe running at once. Each instance

19 2. Malware Types and Properties of svchost.exe contains a group of services that makes development, test- ing, and service group management easier. The groups are defined at the following registry location (each value represents a different group):

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Svchost

Services are defined in the registry at the following location:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ServiceName

Windows services contain many registry values, most of which provide information about the service, such as DisplayName and Description. Mal- ware authors often set values that help the malware blend in, such as Net- WareMan, which ‘Provides access to file and print resources on NetWare networks.’ Another service registry value is ImagePath, which contains the location of the service executable. In the case of an svchost.exe DLL, this value contains %SystemRoot%/System32/svchost.exe –k GroupName. All svchost.exe DLLs contain a Parameters key with a ServiceDLL value, which the malware author sets to the location of the malicious DLL. The Start value, also under the Parameters key, determines when the service is started (malware is typically set to launch during system boot). Windows has a set number of service groups predefined, so malware will typically not create a new group, since that would be easy to detect. Instead, most mal- ware will add itself to a preexisting group or overwrite a nonvital service – often a rarely used service from the netsvcs service group. To identify this technique, monitor the Windows registry using dynamic analysis, or look for service functions such as CreateServiceA in the disassembly. If malware is modifying these registry keys, you’ll know that it’s using this persistence technique. [3]

Trojanized System Binaries

Another way that malware gains persistence is by trojanizing system bina- ries. With this technique, the malware patches bytes of a system binary to force the system to execute the malware the next time the infected binary is run or loaded. Malware authors typically target a system binary that is used frequently in normal Windows operation. DLLs are a popular target. A system binary is typically modified by patching the entry function so that it jumps to the malicious code. The patch overwrites the very beginning of the function or some other code that is not required for the trojanized DLL

20 2. Malware Types and Properties to operate properly. The malicious code is added to an empty section of the binary, so that it will not impact normal operation. The inserted code typically loads malware and will function no matter where it’s inserted in the infected DLL. After the code loads the malware, it jumps back to the original DLL code, so that everything still operates as it did prior to the patch. [3]

DLL Load-Order Hijacking DLL load-order hijacking is a simple, covert technique that allows malware authors to create persistent, malicious DLLs without the need for a reg- istry entry or trojanized binary. This technique does not even require a separate malicious loader, as it capitalizes on the way DLLs are loaded by Windows. [3] The default search order for loading DLLs on Windows XP is as fol- lows: [3]

1. The directory from which the application loaded

2. The current directory

3. The system directory (the GetSystemDirectory function is used to get the path, such as .../Windows/System32/)

4. The 16-bit system directory (such as .../Windows/System/)

5. The Windows directory (the GetWindowsDirectory function is used to get the path, such as .../Windows/)

6. The directories listed in the PATH environment variable

Under Windows XP, the DLL loading process can be skipped by utilizing the KnownDLLs registry key, which contains a list of specific DLL locations, typically located in .../Windows/System32/. The KnownDLLs mechanism is designed to improve security (malicious DLLs can’t be placed higher in the load order) and speed (Windows does not need to conduct the default search in the preceding list), but it contains only a short list of the most important DLLs. DLL load-order hijacking can be used on binaries in directories other than /System32 that load DLLs in /System32 that are not protected by KnownDLLs. For example, explorer.exe in the /Windows directory loads ntshrui.dll found in /System32. Because ntshrui.dll is not a known DLL, the default search is followed, and the /Windows directory is checked

21 2. Malware Types and Properties before /System32. If a malicious DLL named ntshrui.dll is placed in /Windows, it will be loaded in place of the legitimate DLL. The malicious DLL can then load the real DLL to ensure that the system continues to run properly. Any startup binary not found in /System32 is vulnerable to this attack, and explorer.exe has roughly 50 vulnerable DLLs. Additionally, known DLLs are not fully protected due to recursive imports, and because many DLLs load other DLLs, which follow the default search order. [3]

2.1.5 Privilege Escalation Most users run as local administrators, which is good news for malware authors. This means that the user has administrator access on the machine, and can give the malware those same privileges. The security community recommends not running as local administrator, so that if he accidentally runs malware, it won’t automatically have full access to his system. If a user launches malware on a system but is not running with administrator rights, the malware will usually need to perform a privilege-escalation attack to gain full access. The majority of privilege-escalation attacks are known exploits or zero-day attacks against the local OS, many of which can be found in the Metasploit Framework8. DLL load-order hijacking can even be used for a privilege escalation. If the directory where the malicious DLL is located is writable by the user, and the process that loads the DLL is run at a higher privilege level, then the malicious DLL will gain escalated privileges. Malware that includes privilege escalation is relatively rare, but common enough that an analyst should be able to recognize it. Sometimes, even when the user is running as local administrator, the malware will require privilege escalation. Processes running on a Windows machine are run either at the user or the system level. Users generally can’t manipulate system-level processes, even if they are administrators. A common way that malware gains the privileges necessary to attack system-level processes on Windows machines is discussed next. [3]

Using SeDebugPrivilege Processes run by a user don’t have free access to everything, and can’t, for instance, call functions like TerminateProcess or CreateRemoteThread on remote processes. One way that malware gains access to such functions is by setting the access token’s rights to enable SeDebugPrivilege. In Windows

8. http://www.metasploit.com/

22 2. Malware Types and Properties systems, an access token is an object that contains the security descriptor of a process. The security descriptor is used to specify the access rights of the owner – in this case, the process. An access token can be adjusted by calling AdjustTokenPrivileges. The SeDebugPrivilege privilege was cre- ated as a tool for system-level debugging, but malware authors exploit it to gain full access to a system-level process. By default, SeDebugPrivilege is given only to local administrator accounts, and it is recognized that granting SeDebugPrivilege to anyone is essentially equivalent to giving them LocalSystem account access. A normal user account cannot give itself SeDebugPrivilege; the request will be denied. However, using vulnerability- targeted exploits, the attacker is able to bypass security countermeasures like these. [3]

2.1.6 User-Mode Rootkits Malware often goes to great lengths to hide its running processes and persis- tence mechanisms from users. The most common tool used to hide malicious activity is referred to as a rootkit. Rootkits can come in many forms, but most of them work by modifying the internal functionality of the OS. These modifications cause files, processes, network connections, or other resources to be invisible to other programs, which makes it difficult for antivirus prod- ucts, administrators, and security analysts to discover malicious activity. Some rootkits modify user-space applications, but the majority modify the kernel, since protection mechanisms, such as intrusion prevention systems, are installed and running at the kernel level. Both the rootkit and the defen- sive mechanisms are more effective when they run at the kernel level, rather than at the user level. At the kernel level, rootkits can corrupt the system more easily than at the user level. [3]

IAT Hooking IAT hooking is a classic user-space rootkit method that hides files, processes, or network connections on the local system. This hooking method modifies the import address table (IAT) or the export address table (EAT). The IAT technique is an old and easily detectable form of hooking, so many modern rootkits use the more advanced inline hooking method instead. [3]

Inline Hooking Inline hooking overwrites the API function code contained in the imported DLLs, so it must wait until the DLL is loaded to begin executing. IAT

23 2. Malware Types and Properties hooking simply modifies the pointers, but inline hooking changes the actual function code. A malicious rootkit performing inline hooking will often re- place the start of the code with a jump that takes the execution to malicious code inserted by the rootkit. Alternatively, the rootkit can alter the code of the function to damage or change it, rather than jumping to malicious code. Since many defense programs expect inline hooks to be installed at the beginning of functions, some malware authors have attempted to insert the jmp or the code modification further into the API code to make it harder to find. [3]

2.2 Launching Malware Silently

As discussed previously, the launcher is a type of malware that sets itself or another piece of malware for immediate or future covert execution. The goal of a launcher is to set up things so that the malicious behavior is concealed from a user. [3]

2.2.1 Launchers Launchers often contain the malware that they’re designed to load. The most common example is an executable or DLL in its own resource section. The resource section in the Windows PE file format is used by the executable and is not considered part of the executable. Examples of the normal contents of the resource section include icons, images, menus, and strings. Launchers will often store malware within the resource section. When the launcher is run, it extracts an embedded executable or DLL from the resource section before launching it. Malware launchers often must be run with administrator privileges or escalate themselves to have those privileges. Average user processes can’t perform all of the features needed to do its job, so it may contain privilege- escalation code that ensures that the malware is successfully run. [3]

2.2.2 Process Injection The most popular covert launching technique is process injection. As the name implies, this technique injects code into another running process, and that process unwittingly executes the malicious code. Malware authors use process injection in an attempt to conceal the malicious behavior of their code, and sometimes they use this to try to bypass host-based firewalls and other process-specific security mechanisms. Certain Windows API calls

24 2. Malware Types and Properties are commonly used for process injection. For example, the VirtualAllocEx function can be used to allocate space in an external process’s memory, and WriteProcessMemory can be used to write data to that allocated space. This pair of functions is essential to the first three loading techniques discussed further. [3]

DLL Injection

DLL injection – a form of process injection where a remote process is forced to load a malicious DLL – is the most commonly used covert loading tech- nique. DLL injection works by injecting code into a remote process that calls LoadLibrary, thereby forcing a DLL to be loaded in the context of that process. Once the compromised process loads the malicious DLL, the OS automatically calls the DLL’s DllMain function, which is defined by the author of the DLL. This function contains the malicious code and has as much access to the system as the process in which it is running. Malicious DLLs often have little content other than the Dllmain function, and every- thing they do will appear to originate from the compromised process. The function CreateRemoteThread is commonly used for DLL injection to allow the launcher malware to create and execute a new thread in a remote process. When CreateRemoteThread is used, it is passed three important parame- ters: the process handle (hProcess) obtained with OpenProcess, along with the starting point of the injected thread (lpStartAddress) and an argument for that thread (lpParameter). For example, the starting point might be set to LoadLibrary and the malicious DLL name passed as the argument. This will trigger LoadLibrary to be run in the victim process with a parame- ter of the malicious DLL, thereby causing that DLL to be loaded in the victim process (assuming that LoadLibrary is available in the victim pro- cess’s memory space and that the malicious name string exists within that same space). Malware authors generally use VirtualAllocEx to create space for the malicious library name string. The VirtualAllocEx function allocates space in a remote process if a handle to that process is provided. The last setup function required before CreateRemoteThread can be called is WriteProcessMemory. This function writes the malicious library name string into the memory space that was allocated with VirtualAllocEx. [3]

Direct Injection

Like DLL injection, direct injection involves allocating and inserting code into the memory space of a remote process. Direct injection uses many of

25 2. Malware Types and Properties the same Windows API calls as DLL injection. The difference is that instead of writing a separate DLL and forcing the remote process to load it, direct injection malware injects the malicious code directly into the remote process. Direct injection is more flexible than DLL injection, but it requires a lot of customized code in order to run successfully without negatively impacting the host process. This technique can be used to inject compiled code, but more often, it’s used to inject shellcode. Three functions are commonly found in cases of direct injection: VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread. There will typically be two calls to VirtualAllocEx and WriteProcessMemory. The first will allocate and write the data used by the remote thread, and the second will allocate and write the remote thread code. The call to CreateRemoteThread will contain the location of the re- mote thread code (lpStartAddress) and the data (lpParameter). Since the data and functions used by the remote thread must exist in the victim pro- cess, normal compilation procedures will not work. For example, strings are not in the normal .data section, and LoadLibrary/GetProcAddress will need to be called to access functions that are not already loaded. There are other restrictions, which are not stated here. Basically, direct injection re- quires that authors either be skilled assembly language coders or that they will inject only relatively simple shellcode. [3]

2.2.3 Process Replacement

Rather than inject code into a host program, some malware uses a method known as process replacement to overwrite the memory space of a running process with a malicious executable. Process replacement is used when a mal- ware author wants to disguise malware as a legitimate process, without the risk of crashing a process through the use of process injection. This technique provides the malware with the same privileges as the process it is replacing. For example, if a piece of malware were to perform a process-replacement attack on svchost.exe, the user would see a process name svchost.exe running from C:\Windows\System32 and probably think nothing of it. (This is a common malware attack, by the way.) Key to process replacement is creating a process in a suspended state. This means that the process will be loaded into memory, but the primary thread of the process is suspended. The program will not do anything until an external program resumes the primary thread, causing the program to start running. In the final step, the malware restores the victim process environment so that the malicious code can run by calling SetThreadContext to set the entry point to point to the malicious code. Finally, ResumeThread is called to initiate the malware, which has now

26 2. Malware Types and Properties replaced the victim process. Process replacement is an effective way for mal- ware to appear non-malicious. By masquerading as the victim process, the malware is able to bypass firewalls or intrusion prevention systems (IPSs) and avoid detection by appearing to be a normal Windows process. Also, by using the original binary’s path, the malware deceives the savvy user who, when viewing a process listing, sees only the known and valid binary executing, with no idea that it was unmapped. [3]

2.2.4 Hook Injection Hook injection describes a way to load malware that takes advantage of Win- dows hooks, which are used to intercept messages destined for applications. Malware authors can use hook injection to accomplish two things: [3]

• To be sure that malicious code will run whenever a particular message is intercepted

• To be sure that a particular DLL will be loaded in a victim process’s memory space

Local and Remote Hooks There are two types of Windows hooks: [3]

• Local hooks are used to observe or manipulate messages destined for an internal process.

• Remote hooks are used to observe or manipulate messages destined for a remote process (another process on the system).

Remote hooks are available in two forms: high and low level. High-level remote hooks require that the hook procedure be an exported function con- tained in a DLL, which will be mapped by the OS into the process space of a hooked thread or all threads. Low-level remote hooks require that the hook procedure be contained in the process that installed the hook. This procedure is notified before the OS gets a chance to process the event. [3]

Keyloggers Using Hooks Hook injection is frequently used in malicious applications known as key- loggers, which record keystrokes. Keystrokes can be captured by registering high- or low-level hooks using the WH_KEYBOARD or WH_KEYBOARD_LL hook procedure types, respectively.

27 2. Malware Types and Properties

For WH_KEYBOARD procedures, the hook will often be running in the con- text of a remote process, but it can also run in the process that installed the hook. For WH_KEYBOARD_LL procedures, the events are sent directly to the process that installed the hook, so the hook will be running in the context of the process that created it. Using either hook type, a keylogger can intercept keystrokes and log them to a file or alter them before passing them along to the process or system. [3]

Using SetWindowsHookEx The principal function call used to perform remote Windows hooking is SetWindowsHookEx that takes several parameters (one of these parameters is dwThreadId 9). The hook procedure can contain code to process messages as they come in from the system, or it can do nothing. Either way, the hook procedure must call CallNextHookEx, which ensures that the next hook procedure in the call chain gets the message and that the system continues to run properly. [3]

Thread Targeting When targeting a specific dwThreadId, malware generally includes instruc- tions for determining which system thread identifier to use, or it is designed to load into all threads. That said, malware will load into all threads only if it’s a keylogger or the equivalent (when the goal is message interception). However, loading into all threads can degrade the running system and may trigger an IPS. Therefore, if the goal is to simply load a DLL in a remote process, only a single thread will be injected in order to remain stealthy. Tar- geting a single thread requires a search of the process listing for the target process and can require that the malware run a program if the target process is not already running. If a malicious application hooks a Windows message that is used frequently, it’s more likely to trigger an IPS, so malware will often set a hook with a message that is not often used, such as WH_CBT (a computer-based training message). [3]

2.2.5 Detours Detours is a library developed by in 1999. It was origi- nally intended as a way to easily instrument and extend existing OS and ap-

9. Specifies the identifier of the thread with which the hook procedure is to be associated. If this parameter is zero, the hook procedure is associated with all existing threads running in the same desktop as the calling thread. This must be set to zero for low-level hooks.

28 2. Malware Types and Properties plication functionality. The Detours library makes it possible for a developer to make application modifications simply. Malware authors like Detours, too, and they use the Detours library to perform import table modification, attach DLLs to existing program files, and add function hooks to running processes. Malware authors most commonly use Detours to add new DLLs to existing binaries on disk. The malware modifies the PE structure and cre- ates a section named .detour, which is typically placed between the export table and any debug symbols. The .detour section contains the original PE header with a new import address table. The malware author then uses De- tours to modify the PE header to point to the new import table, by using the setdll tool provided with the Detours library. Instead of using the of- ficial Microsoft Detours library, malware authors have been known to use alternative and custom methods to add a .detour section. [3]

2.2.6 APC Injection Creating a thread using CreateRemoteThread can invoke functionality in a remote process. However, thread creation requires overhead, so it would be more efficient to invoke a function on an existing thread. This capability exists in Windows as the asynchronous procedure call (APC). APCs can direct a thread to execute some other code prior to executing its regular execution path. Every thread has a queue of APCs attached to it, and these are processed when the thread is in an alertable state, such as when they call functions like WaitForSingleObjectEx, WaitForMultipleObjectsEx, and Sleep. These functions essentially give the thread a chance to process the waiting APCs. If an application queues an APC while the thread is alertable but before the thread begins running, the thread begins by calling the APC function. A thread calls the APC functions one by one for all APCs in its APC queue. When the APC queue is complete, the thread continues running along its regular execution path. Malware authors use APCs to preempt threads in an alertable state in order to get immediate execution for their code. [3] APCs come in two forms: [3]

• An APC generated for the system or a driver is called a kernel-mode APC.

• An APC generated for an application is called a user-mode APC.

Malware generates user-mode APCs from both kernel and user space using APC injection.

29 2. Malware Types and Properties

APC Injection from User Space From user space, another thread can queue a function to be invoked in a remote thread, using the API function QueueUserAPC. Because a thread must be in an alertable state in order to run a user-mode APC, malware will look to target threads in processes that are likely to go into that state. Luckily for the malware analyst, WaitForSingleObjectEx is the most com- mon call in the Windows API, and there are usually many threads in the alertable state. An example process that is a popular target for APC in- jection is svchost.exe because its threads are often in an alertable state. Malware may APC-inject into every thread of svchost.exe just to ensure that execution occurs quickly. [3]

APC Injection from Kernel Space Malware drivers and rootkits often wish to execute code in user space, but there is no easy way for them to do it. One method they use is to perform APC injection from kernel space to get their code execution in user space. A malicious driver can build an APC and dispatch a thread to execute it in a user-mode process (most often svchost.exe). APCs of this type often consist of shellcode. Device drivers leverage two major functions in order to utilize APCs: KeInitializeApc and KeInsertQueueApc. [3]

30 3 Leaving Footprints

Malware often leaves footprints behind its doings. As stated before, these include various registry entries, file fingerprint hashes, containing common strings and functions it uses. If a certain group of malware performs and be- haves the same way, it can be categorized into group that have many similar properties. Based on these properties, the malware can sometimes be caught even when there is no special signature to detect that exact piece of mal- ware. There are many techniques that are commonly used, so the security analyst’s job is made somewhat easier. However, when launching a precisely targeted campaigns, attackers mainly use zero-day exploits and customized attack mechanisms. These mechanisms employ encodings, anti-disassembly, anti-debugging and code-packing techniques, so the used functions and mal- ware strategies are not given away so easily and a reverse-engineering is made difficult. Defenders in these campaigns are often in unknown waters and must look at all possibilities. They must find all possible clues that can identify malware functionality. But sometimes even small hints can help identify if the victim’s system is infected. Whether the hints are only subtle or obvious, they are a key piece of information that (hopefully for the good guys) leads to its doom. Specifying these hints is a first step to a signature creation. While typing and spelling errors are not a big deal (to a certain degree) when creating some tool, they can reliably identify one malware family and variant. Because it is unlikely for other legitimate software to have these strings present, these errors of the attackers may precisely catch a certain unwanted malware. This uniqueness is what can lead to a good and non-redundant signature creation.

3.1 Signatures

When a defender has finally dissected a threat, another technique comes to the action. The goal is to describe the threat in the most universal manner (so slight variants are also detected) but simultaneously detect the malware comfortably and be able to classify its behavior. Signatures are taking ad- vantages of classifications that are able to put files into categories that share similar properties. Some of these properties may be: file name or format, file contents (even binary), lists of exported functions, strings contained in the file, process name etc. These classifications also allows to use previ- ously gained knowledge to the defender’s advantage. If a certain piece of malware is a launcher, the defender knows that it can hide another down-

31 3. Leaving Footprints loaded malware on the system. It can check whether a malware connects to a list of known malware websites, if it is obfuscated, trying to reveal more information about itself. It can detect when a malware connects to a new URL, indicating the malware could be somehow changed and the attacker has taken the defense mechanisms in account and is trying to bypass these signatures. The defender can then change the signature to reflect these new pieces information gained. However, changes may be drastic enough to create a new classification variant of said malware.

3.2 Tools for Effective Signature Creation

There are many steps that lead to a signature creation, beginning from stay- ing anonymous while analyzing the malware, through examining, running and observing system behavior, to creating the most refined signature that can be reliably used to detect all possible variants of the malicious code.

3.2.1 Staying Anonymous The justification for anonymity when researching malware and bad guys is pretty straightforward. Malware analysts do not want information to show up in logs and other records that might tie back to them or their organization. For example, let’s say they work at a financial firm and they recently detected that a banking trojan infected several their systems. They collected malicious domain names, IP addresses, and other data related to the malware. The next steps they take in their research may lead them to websites owned by the criminals. As a result, if they are not taking precautions to stay anonymous, their IP address will show up in various logs and be visible to miscreants. If the criminals can identify them or the organization from which they conduct their research, they may change tactics or go into hiding, thus spoiling their investigation. Even worse, they may turn the tables and attack them in a personal way (such as identity theft) or launch a distributed denial of service (DDoS) attack against their IP address. For example, the Storm worm initiated DDoS attacks against machines that scanned an infected system.10 [4] The various techniques increasing anonymity are mentioned in [4]. These techniques may be TORs, proxies, web-based anonymizers, cellular internet connections and virtual private networks. • TOR – The Onion Router, a network of computers that process an

10. http://www.securityfocus.com/news/11482

32 3. Leaving Footprints

encrypted request and decrypt only a single layer (just like when peel- ing an onion) and finally one exit node queries a destination server which sends a response to the exit node and that exit node sends the response back to the original client – again – in an encrypted way. Each intermediate computer only knows its next neighbor. Compro- mised exit nodes may exist so it is advised to be careful while using TOR, because these exit nodes may be sources of malware too. • Proxies – A proxy server is a direct intermediate node between a client and a server, however, the proxy server (unlike the exit node in a TOR) knows exactly who the client is. Different kinds of proxies can also forward user requests with user’s original IP address attached. • Web-Based Anonymizers – Similar to a proxy server, these ser- vices act as an intermediate node between a client and a server. Set- ting up a proxy connection is not necessary as user can gain proxy functionality by connecting to an anonymizer page that proxies the connection to the server. They can also present user’s IP address to the server. • Alternate ways ◦ Cellular Internet Connections – The main strength of this type of connection that uses cellular carrier is that no page can track user down directly, because user’s carrier often assigns a new, dynamically generated, IP address. Of course, the draw- backs might be lower speeds, lower signal strength and/or addi- tional data costs. ◦ Virtual Private Networks – When using this type of service, the user maintains an encrypted connection to a server which assigns an IP address and user’s computer is visible to other servers as if it belonged to a local network of the virtual private server (VPS). A rule of thumb is, that better solutions to problems cost more. So bear- ing this in mind, using free services may often be cheaper, but at risk of spoiling the investigation and not being able to fully grasp the malware functionality and to create an effective signature. Fortunately, there are ways that can emulate a real system environment, e.g. a network activity without ever needing to be connected to the real server with additional malicious code, thus reducing the risk of leaking malware analyst’s information the the attacker.

33 3. Leaving Footprints

3.2.2 Creating a Secure and Controlled Environment Executing and examining a malicious code can be harmful to the analyst’s system, so it is best advised that every controlled malicious code execution should be run on either a virtual machine or a dedicated physical machine. The former one has better properties for a dynamic analysis11 (being able to take snapshots12) while the latter one has advantages for being used as a honeypot system13 which is running non-stop and is gathering new trends and threats. However, virtual machine environment can be detected while running malware and the malware can act differently (anti-virtual machine techniques). So in some cases, using a physical machine for malware analysis may be the only option.

Tools Than May Come Handy Running a piece of code always does something. Let it be a simple termina- tion of a program or a file query or creation. It always does something. This is a specialty where a tool like comes to action.14 Process Monitor is a tool that logs every action made by every running process and you can find that a certain process creates a file in a temporary folder or it accesses an URL. A simple example of its use scenario can be seen in fig. 3.1. Examining further, it may be interesting to find if process has started other processes. This kind of information can be easily viewed in , which the user can even set to replace a standard task manager for Windows for more control over running processes. A simple view of a process tree with a list of handles in the lower pane of the windows is depicted in fig. 3.2. The lower pane can also display used DLLs. Malicious code can also modify registry entries to make itself persistent and because of this, another tool emerged. RegShot15 can take a registry snapshot before and after running malware and it will display all differences between these two moments. Tracking down the registry entry is then made trivial.

11. Analysis conducted by running a malware sample. 12. That is, capturing a full state of a system and being able to roll back to this point, discarding all changes. 13. A system dedicated to attract (lure them on sweet, honey-like properties such as being vulnerable or interesting in some way) and capture all different kinds of "malbear". 14. http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx 15. http://sourceforge.net/projects/regshot/

34 3. Leaving Footprints

Figure 3.1: Filtering winamp.exe actions using Process Monitor

Figure 3.2: Displaying process tree and process handles in Process Explorer

35 3. Leaving Footprints

After running a malware, in most cases it will perform some kind of network activity. If the analysts want to make an isolated environment, they can leverage a technique provided by ApateDNS16. ‘Mandiant ApateDNS is a tool for controlling DNS responses through an easy-to-use GUI. As a phony DNS server, Mandiant ApateDNS spoofs DNS responses to a user-specified IP address by listening on UDP port 53 on the local machine. Mandiant ApateDNS also automatically sets the local DNS to localhost. Upon exiting the tool, it sets back the original local DNS settings.’[5] The ApateDNS technique can be very well paired with another tool, INetSim17. This piece of software is defined as ‘. . . a software suite for sim- ulating common internet services in a lab environment, e.g. for analyzing the network behavior of unknown malware samples.’[6] Any request made by the malware can be caught by ApateDNS and then redirected to INetSim, which will provide an appropriate response to the malware request. This way, the network communication is not interrupted and the analyst can observe malware behavior in a secure environment with a fake network. The last but not least is a packet sniffing tool, Wireshark.18 Wireshark is the world’s foremost network protocol analyzer. It lets you see what’s happening on your network at a microscopic level. It is the de facto (and often de jure) standard across many industries and educational institutions.[7] It offers many features like capturing data for offline analysis, identifying protocols, following TCP streams etc., and reporting. All of the mentioned approaches are simpler ways for malware analysis, more advanced analysis is based on reverse-engineering and debugging.

Virtual Machines As it was mentioned before, virtual machines have some great advantages (as opposed to normal systems). Taking a snapshot is best utilized before running a malware. Once a snapshot is taken, the malware can be run and then another snapshot can be taken. The difference between these snapshots can show what the malware did. These differences are best utilized with tools mentioned in the previous section.

16. https://www.mandiant.com/resources/download/research-tool-mandiant- apatedns 17. http://www.inetsim.org/ 18. http://www.wireshark.org/

36 4 Signature Formats

4.1 Mandiant’s OpenIOC

Mandiant’s IOC (Indicators of Compromise) format – more specifically Ope- nIOC – is a format for recording, defining, and sharing information allowing its users to share various malware signatures and develop them rapidly. It is provided with various tools that can be used for signature creation and they are used with ease. OpenIOC focuses on describing malware in approxi- mately 500 ways, which include file hashes, process names, entrypoint bytes, digital signatures (or their absence) and many more. These indicators (or signatures) can be easily edited, taking advantage of new intelligence gath- ered by malware analysts and leveraging this knowledge to their advantage. Signatures are defined by an XML schema that makes signature creation more easy and having a schema is a really good backbone analysts can use. XML format can be easily parsed and interpreted in various ways, show- ing important parts and simultaneously hiding uninteresting code from the users. These signatures can also be named, it can have assigned keywords and can be grouped by these criteria into same-family signatures. This fact enables easier categorization and faster response times. Another advantage of this format is the fact that XML is very widely used and can be easily automatically processed by tools. Badly designed signatures can be validated against the schema, but this is not the case with usage of Mandiant’s tools that are GUI-based and hide their complexity from the user. Users even can write their own custom indicators that suit their environment or threat.

4.1.1 OpenIOC Indicators of Compromise (IOCs) are forensic artifacts of an intrusion that can be identified on a host or network. OpenIOC is a threat information sharing standard that allows you to logically group forensic artifacts, and communicate this information in a machine readable format. The terms are sometimes used interchangeably, but an IOC (also sometimes just called an Indicator) is a logically grouped set of descriptive terms (each called an ‘Indicator Term’) about a specific threat while OpenIOC is the language used to describe those specific sets (e.g. an incident response team would use the OpenIOC format to write multiple IOCs during the course of responding to an incident). Indicator Terms are the name of the specific types of data elements that are included in IOCs. Indicator Terms are usually organized in Indicator Term Documents, which are groupings of indicators inside an XML

37 4. Signature Formats document. When creating an IOC, an investigator can use as many or as few terms from as many or as few sources as they like. An organization desiring to extend OpenIOC to include new types of elements that are unique to their enterprise or circumstances would create and host an Indicator Term Document that contained the new Indicator Terms they wished to make available for others to use. [8]

4.1.2 IOC Functionality Apart from stated use-case scenarios, users can also include items from more advanced forensic techniques, such as memory forensics, looking for artifacts that are much harder for attackers to change, or artifacts that attackers are more likely to recycle, such as running process components (including process handle names), the imports and exports used by an executable, and more. These can all be combined together in different logically grouped com- binations, which are refined as the investigator learns more about the intru- sion they are working on. Many different types of specific indicators can be combined together in one IOC, so that any of several sets of signatures of differing types of complexity could apply within one particular IOC. Going beyond making logically complex indicators, there are also additional ways to use IOCs than just a straight query against a host. IOCs can also be used with logical operators to exclude entire classes of the hosts or network being examined when querying against harvested data sets. Instead of looking for a specific file using terms that have to precisely match, IOCs can also be used to match all of the files that should be on a particular part of a system. An investigator would collect unfiltered data from systems, and then run an IOC against the collection to look for the files that stand out. [8] Simple use cases allow querying for forensic artifacts such as: [8]

• Looking for a specific file by MD5 sum (hash), file name, size, create date, or other file attributes

• Looking for a specific entity in Memory (process information, running service information)

• Looking for a specific entry or set of entries in the Windows Registry

• Combining these together in various combinations to provide better matching and less false positives than searches for individual artifacts.

More complex use cases and techniques combine these together and allow even more depth: [8]

38 4. Signature Formats

• Instead of just looking for specific file artifacts in one part of the operating system or network, groups of artifacts can be combined together using the logic of OpenIOC to create a match on artifact groups that are common across families of malware or other intrusion tools (such as from the same authors or threat actor groups).

• Instead of hunting down a specific known bad file, an incident re- sponder could make a whitelist of the files that were known to be in a directory, and then catch all the files that were NOT on that list, assuming the investigator had a full collection of what was on the system.

All currently supported IOC terms are available at [9]. A sample IOC file is attached in appendix B.

4.1.3 Using IOC in the Investigative Lifecycle IOCs are a part of an effective and proven workflow that is at the core of MANDIANT’s own incident response process. The flexibility and machine- readable nature of the OpenIOC format are what makes this possible. The following outline shows some of the steps involved in the lifecycle of an investigation, and how OpenIOC and IOCs make that possible. The fig. 4.1 shows the OpenIOC lifecycle described with phases: [8]

• Initial Evidence – Evidence of a compromise is detected, either on a host or on the network. This may be in response to law en- forcement (LE) notification or an anomaly noticed from a variety of sources. Regardless of what led to it, responders investigate and iden- tify something which is a concrete forensic indicator of an intrusion.

• Create IOCs for Host & Network – Following the initial dis- covery of forensic evidence, the investigators create an IOC from the existing data. The specific type of IOC created will vary based on the evidence, the environment, and the skill and comfort level of the investigator. The flexibility of OpenIOC allows a limitless number of permutations on how an Indicator can be crafted, so the investigator using OpenIOC has a lot of options as to how they want to proceed.

• Deploy IOCs in the Enterprise – Once an IOC or set of IOCs have been created, the investigator will deploy these to technology that can look for the existence of the IOC(s) on other systems or other parts of the network. In the MANDIANT workflow, these are fed into

39 4. Signature Formats

MANDIANT Intelligent ResponseTM (MIR) appliances, which then communicate with MIR Agents on hosts, or monitor network traffic. IOCs could be easily transformed and fed into IDS (Intrusion Detec- tion System), IPS (Intrusion Prevention System), HIDS/HIPS (Host- based Intrusion Detection System/Host-based Intrusion Prevention System), SIEMS (Security Information and Event Management Sys- tems), or other investigative tools to look into the enterprise.

• Identify Additional Suspect Systems – After deploying the IOCs into suitable technologies, additional systems will be identified, unless the first host was the only endpoint compromised.

• Collect Evidence – Additional evidence is acquired from the addi- tional systems that have been identified.

• Analyze Evidence – The additional data collected is analyzed. This can identify further intrusion, false positives, or additional intelligence for the investigators. This feedback then allows for the investigator to refine their searches and to return to the start of the workflow.

• Refine & Create New IOCs – The investigative team can create new IOCs based of their findings in the enterprise and additional intelligence, and continue to refine their cycle until they feel they either have exhausted the need to find new information, or other factors force them to stop investigating and move to remediation.

Figure 4.1: OpenIOC lifecycle, source: [8]

40 4. Signature Formats 4.2 YARA

YARA is a tool aimed at helping malware researchers to identify and clas- sify malware samples. With YARA you can create descriptions of malware families based on textual or binary patterns contained on samples of those families. Each description consists of a set of strings and a Boolean expres- sion which determines its logic. Suppose that we have a malware family with two variants, one of them downloads a malicious file from a random site http://samplesite.com/trojan1.exe, the other downloads a file from an- other site at http://anothersamplesite.com/trojan2.exe, the URLs are hardcoded into the malware code. Both variants drops the downloaded file with the name win.exe, which also appears hardcoded into the samples. For this hypothetical family we can create a rule like this: [10] rule TrojanDownloader { strings: $a = "kernel32.dll" $b = "http://samplesite.com/trojan1.exe" $c = "http://anothersamplesite.com/trojan2.exe" condition: $a and ($b or $c) } The rule above instructs YARA that every file containing the string kernel32.dll and any of the two URLs aliased by $b and $c must be reported as TrojanDownloader.

4.2.1 Creating Rules YARA rules are easy to write and understand, and they have a syntax that resembles in some way a C struct declaration. The simplest rule that you can write for YARA, which does absolutely nothing, could look like this: [10] rule Dummy { condition: false } Each rule in YARA starts with the keyword rule followed by a rule iden- tifier. Identifiers must follow the same lexical conventions of the C program- ming language, they can contain any alphanumeric character and the under- score character, but the first character can not be a digit. Rule identifiers

41 4. Signature Formats are case sensitive and cannot exceed 128 characters. The following keywords are reserved and cannot be used as an identifier: [10] and or at of condition private entrypoint rule false rva filesize section fullword strings is them nocase true not widechar Rules are generally composed of two sections: strings definition and con- dition, although the strings definition section can be omitted if the rule doesn’t rely on any string. The condition section is always required. The strings definition section is where the strings that will be part of the rule are defined. Each string has an identifier consisting in a $ character followed by a sequence of alphanumeric characters and underscores, these identifiers can be used in the condition section to refer to the corresponding string. Strings can be defined in text or hexadecimal form, as shown in the following exam- ple: [10] rule ExampleRule { strings: $my_text_string = "text here" $my_hex_string = { E2 34 A1 C8 23 FB } condition: $my_text_string or $my_hex_string } Text strings are enclosed on double quotes just like in the C language. Hex strings are enclosed by curly brackets, and they are composed by a sequence of hexadecimal numbers that can appear contiguously or separated by spaces. Decimal numbers are not allowed in hex strings. The condition section is where the logic of the rule resides. This section must contain a Boolean expression telling under which circumstances a file satisfies the rule or not. Generally, the condition will refer to previously defined strings by using the string identifier. In this context the string identifier acts as a Boolean variable which evaluate to true of the string was found in the file, or false otherwise. If the condition is true for a given file, the file matches the rule. [10]

42 4. Signature Formats

Strings There are three types of strings in YARA: hexadecimal strings, text strings and regular expressions. Hexadecimal strings are used for defining raw se- quences of bytes, while text strings and regular expressions are useful for defining portions of legible text. However text strings and regular expressions can be also used for representing raw bytes by mean of escape sequences as will be shown below. [10]

• Hexadecimal strings – $string_hex = { E2 34 1C C8 A6 FB } or a wildcard one $string_hex = { E2 34 ?? C8 A? FB }

• Text strings – string in form of $string_text = "sample_string"

• Regular expressions – an example of this regular expression could be $regex = /md5: [0-9a-zA-Z]32/

Conditions Conditions are nothing more than Boolean expressions as those that can be found in all programming languages, for example in an if statement. They can contain the typical Boolean operators and, or and not and relational operators >=, <=, <, >, == and !=. Also, the arithmetic operators (+, -, *, /) can be used on numerical expressions. String identifiers can be also used within a condition, acting as Boolean variables whose value depends on the presence or not of the associated string in the file. Rule definition depicting the use of various condition types can be found at YARA. [10] rule Example { strings: $a = "text1" $b = "text2" $c = "text3" $d = "text4" condition: ($a or $b) and ($c or $d) }

Restricting String Offsets In the majority of cases, when a string iden- tifier is used in a condition, we are willing to know if the associated string is anywhere within the file, but sometimes we need to know if the string is at

43 4. Signature Formats some specific offset on the file. In such situations the operator at is what we need. While the at operator allows to search for a string at some fixed offset in the file, the in operator allows to search for the string within a range of offsets. [10] Example #1: $a at 100 and $b at 200. Example #2: $a in [0..100] and $b in [100..filesize].

File Size String identifiers are not the only variables that can appear in a condition (in fact, rules can be defined without any string definition as will be shown below), there are other special variables that can be used as well. One of these especial variables is filesize, which holds, as its name indicates, the size of the file being analyzed. The size is expressed in bytes. A KB postfix, when attached to a numerical constant, automatically multiplies the value of the constant by 1024. The MB postfix can be used to multiply the value by 220. Both postfixes can be used only with decimal constants. [10] Example: filesize > 200KB.

Entry Point Another special variable than can be used on a rule is en- trypoint. If the file is a Portable Executable (PE), this variable holds the raw offset (not the RVA19) of the entry point of the executable. A typical use of this variable is to look for some pattern at the entry point to detect packers or simple PE infectors. The presence of the entrypoint variable in a rule implies that only PE files can satisfy that rule. If the file is not a PE file, any rule using this variable evaluates to false. [10] Example #1: $a at entrypoint. Example #2: $a in [entrypoint..entrypoint + 10].

Counting Strings Sometimes we need to know not only if a certain string is in the file or not, but how many times the string appears in the file. The number of occurrences of each string is represented by a variable whose name is the string identifier but with a # character in place of the $ character. For example, when the string definition is $a = "something" then the rule stating that the file must contain this string exactly 4 times is #a¨ ¨== 4 (same goes for operators like <, >). [10]

Sets of Expressions There are circumstances in which is necessary to express that the file should comply with a certain number expressions from a given set. None of the expressions are required to be true, but at least some

19. Relative Virtual Address

44 4. Signature Formats of them should be. In these situations the operator of comes into help. The number before the of keyword must be equal to or less than the number of expressions that appear between the parenthesis. You can also replace the list of expressions with the keyword them. This is a syntactic sugar equivalent to a list containing all the strings defined in the rule. [10] Example #1: 2 of ($a,$b at 300,#c > 5). Example #2: 3 of them which is equivalent to 3 of ($a,$b,$c,$d).

Referencing Other Rules When writing the condition for a rule you can also make reference to a previously defined rule in a manner that resembles a function invocation of traditional programming languages. In this way you can create rules that depends on others. Note that is strictly necessary to de- fine the rule being invoked before the one that will make the invocation. [10] Example: Let there be a defined rule Rule 1: rule Rule1 {...}. Then it is possible to reference this rule in another rule’s condition: Rule 2: rule Rule2 {...condition: $a and Rule1} where $a is an arbitrary string and Rule2 will hit only if it satisfies both $a and Rule1.

4.2.2 Release of Version 2.0 When finishing the thesis, a new version of YARA was released on December 26th, 2013. This version brings many new features like a supported regular expression documentation and with that also it’s own regular expression engine. Old version of the manual is no longer available at [10]. New version of the manual is available at [11]. YARA has experienced an almost complete rewrite for version 2.0, as a result this new version has the following advantages over previous ones: [12]

• With YARA 2.0 scanning speed is from 2X to 100X faster depending on your rules. The 100X speedup is only experienced with certain corner cases, but if you have a large and diverse set of rules you’ll definitely notice the improvement.

• Better multi-threading support. Previous versions of YARA tool were thread-safe up to a certain level. You could compile rules and scan multiple files simultaneously, provided that each thread was using its own set of compiled rules. In YARA 2.0 multiple threads can share the same compiled rules to scan multiple files at the same time. The new YARA’s command-line scanner takes advance of that and is now multi-threaded, allowing to scan whole directories blazingly fast.

45 4. Signature Formats

• Rules can be saved to binary form. In the same way you would compile your program’s source code to create an executable file, with YARA 2.0 you can compile your rules and save them into a binary file for later use. This way you can use pre-compiled rules without having to parse them again, or you can share rules with someone else without revealing the actual source code (though it is frowned upon).

The drawbacks for this rewrite are:

• You can find some incompatibilities in regular expressions. YARA 2.0 replaced external libraries like PCRE or RE2 with its own regular expression engine. Most regular expression features are present in the new implementation, but a few ones like POSIX character classes and backreferences are missing. If you were using RE2 instead of PCRE with previous versions of YARA you won’t miss backreferences, because RE2 don’t support them neither.

• The C API provided by libyara has changed. If you’re a developer using this API you’ll need to make some changes to your application in order to adapt it to YARA 2.0. But don’t worry, it won’t be too much work and the benefits worth the effort. Users of yara-python are not affected, the Python interface remains the same.

4.2.3 Advantages of YARA format YARA is completely open source, so anyone can contribute with functionality and view and modify its code to tailor his needs. Also there are many groups that release YARA signatures to open public. It is very lightweight and is suited if the user doesn’t want to work with large frameworks and there are many tools that can easily take a standard output of the YARA scanner and process them in many ways. [10] Since a new version was released, YARA’s functionality in field of regular expression matching was greatly increased. A perfect example of this would be this simple script that automatically unpacks any files that are hit by YARA signature files looking for UPX- packed files.20

20. Script taken from [4]

46 4. Signature Formats

#!/usr/bin/python import sys, yara, commands rules = yara.compile(sys.argv[1]) data = open(sys.argv[2], ’rb’).read() matches = rules.match(data=data) isupx = [m for m in matches if m.rule.startswith("UPX")] if isupx: outp = commands.getoutput("upx -d %s" % sys.argv[2]) print outp

4.3 MAEC

Malware Attribute Enumeration and Characterization (MAEC) is a stan- dardized language for encoding and communicating high-fidelity information about malware based upon attributes such as behaviors, artifacts, and attack patterns. By eliminating the ambiguity and inaccuracy that currently exists in mal- ware descriptions and by reducing reliance on signatures, MAEC aims to improve human-to-human, human-to-tool, tool-to-tool, and tool-to-human communication about malware; reduce possible duplication of malware anal- ysis efforts by researchers; and allow for the faster development of counter- measures by enabling the ability to leverage responses to previously observed malware instances. MAEC in its current state is composed of a data model that spans sev- eral interconnected schemas, thus representing the grammar that defines the language. These schemas permit different forms of MAEC output to be generated, which can be considered as specific uses of the aforementioned grammar. [13]

4.3.1 MAEC Language MAEC Language is defined by three data models and each one of this models is implemented by its own XML schema. The highest model is MAEC Con-

47 4. Signature Formats

Figure 4.2: High-level MAEC overview, source: [13] tainer, the middle is MAEC Package and the lowest is MAEC Bundle. Each model can be used separately, however every model requires all the lower models. This modularization enables flexible use-cases and data sharing.

Low Level – Abstracted Actions At the lowest level, these actions describe only the basic properties of the malware, for example a creation of a registry key or a creation of a file. These actions are not seen in any context, so they are not connected to any of the other events in a logical way. This low level abstraction answers question ‘What does the piece of malware do?’ but it doesn’t answer question ‘Why does it do this?’. However, abstraction from this ‘Why?’ inquiry can be used to construct more abstracted and precise grammars. It also allows to compare malware and find similarities at the lowest level.

Mid Level – Behaviors This tier of the structure is defining the purpose behind the actions at the low level. It describes the behavior of the malware and it looks for an action interconnection. Behaviors differentiate between similar actions that have different purposes, for example a created file could mean the system is wait- ing for an attacker, but it could also mean that the system is already infected

48 4. Signature Formats and listening to the commands.

High Level – Mechanisms The mechanism part is the most abstracted level as it provides views similar to database ones. These views help present relevant data to the user in a way he wants to. Different security specializations have different fields of interest and by using views, these groups of people can select data that are inter- esting to them. They can view payload data, network communicating parts, persistence mechanisms etc., and they can view lower level parts grouped into more concise information.

Example Mapping As a very simple example of how a malicious activity can be mapped between the MAEC Bundle levels, let’s say that a malware instance calls the Win- dows CreateFile API to create the file xyz.dll. This event would first be mapped to the ’Create File’ Action element, and after further investigation, we might conclude that this file was created as a means of instantiating a malicious binary on a system, thus mapping to a ’Malicious Binary Instanti- ation’ Behavior. Finally, the ’Malicious Binary Instantiation’ Behavior could be considered part of a malware ’Persistence’ Mechanism. [13]

Figure 4.3: MAEC Bundle Data Model Example, source: [13]

49 4. Signature Formats

The MAEC Bundle Output Format

The MAEC Bundle XML schema is the current output format which de- scribes a piece of malware as a MAEC Bundle schema instance. The MAEC Bundle schema serves as a container and transport mechanism for use in stor- ing and subsequently sharing MAEC-encoded information about malware, which may include MAEC Actions and Behaviors as well as other attributes obtained from the characterization of a malware instance. [13]

Figure 4.4: MAEC Bundle Schema Overview, source: [13]

A MAEC Bundle is very flexible and can be used to describe anything from a particular insertion method (composed of several low-level Actions and mid-level Behaviors) to any or all of the attributes listed in Figure 4.4. A MAEC Bundle can contain intelligence-derived indicators as well as other signatures and patterns useful in network and host-based intrusion detection. [13] High level definitions of the basic components of the MAEC Bundle schema are given below: [13]

• Malware Instance Object Attributes – Details of the malware instance object that the MAEC Bundle characterizes using its enu- merations and schema. Most commonly, this is a file object with a few attributes, such as name, size, and cryptographic hashes.

50 4. Signature Formats

• Process Tree – Specifies the observed process tree of execution for the malware instance.

• AV Classifications – Captures any Anti-Virus scanner tool classifi- cations of the malware instance object.

• Behaviors – Encompasses all of the MAEC Behaviors in the MAEC Bundle. Each Behavior element can contain information such as a text description, related Actions, and relationships to other Behaviors.

• Actions – Encompasses all of the MAEC Actions in the MAEC Bun- dle. Each Action element can contain information such as the type of Action that it represents (e.g., ’create file’, ’copy file’), discovery method and associated tools, and relationships to other Actions.

• Objects – Encompasses all of the MAEC Objects in the MAEC Bun- dle. Each Object element can contain information such as the type of Object that it represents (e.g., ’file’, ’process’), specific properties of the Object (e.g., ’file name’, ’process name’), and relationships to other Objects.

• Candidate Indicators – Encompasses all of the MAEC Candidate Indicators in the Bundle. Each Candidate Indicator element can con- tain information such as importance, author, description, and target information.

• Collections – Encompasses all of the MAEC Collections in the MAEC Bundle: Behavior Collections, Action Collections, Object Collections, and Candidate Indicator Collections. Each Collection can contain in- formation such as a text description of the Collection, a characteriza- tion of how the elements are related, and a list of the Behaviors/Ac- tions/Objects/Candidate Indicators themselves.

The MAEC Package Output Format The MAEC Package XML schema is currently the standard output format that can be used to describe one or more Malware Subjects using MAEC’s enumerations and schema. As illustrated in fig. 4.5, the content of a MAEC Package includes a set of Malware Subjects and Grouping Relationship in- formation, where the content of a Malware Subject includes additional in- formation: Malware Subject Object Attributes, minor variant information, field data, analysis information, MAEC Bundles associated with the Mal- ware Subject, and information about the relationships between the Malware

51 4. Signature Formats

Subject of focus and other Malware Subjects. In essence, a MAEC Package enables MAEC Bundle management, allowing users to share multiple MAEC Bundles and associated metadata for one or more Malware Subjects. [13]

Figure 4.5: MAEC Package Schema Overview, source: [13]

High level definitions for the various components of the structure of the MAEC Package schema is as follows: [13]

• Malware Subject — represents a single malware object (most com- monly a file) and its associated metadata: ◦ Malware Instance Object Attributes – Details of the object that represents the malware instance characterized by the Mal- ware Subject. Note that this information may be repeated in a MAEC Bundle if the MAEC Bundle is to be self-contained. ◦ Minor Variants – Captures any observed minor variants of the malware instance, such as the same file but with different names. ◦ Field Data – Captures field data and prevalence information relating to the malware instance characterized by the Malware Subject. It imports and uses the metadata:fieldDataEntry type from MMDEF21.

21. IEEE ICSG’s Malware Metadata Exchange Format http://standards.ieee.org/ develop/indconn/icsg/mmdef.html

52 4. Signature Formats

◦ Analyses – Captures analysis-related details such as analyst, source, summary, and tool information. Analyses can reference one or more individual MAEC Bundles to denote that the find- ings of the analysis are captured in the MAEC Bundles. ◦ Findings Bundles – Set of MAEC Bundles pertaining to the Malware Subject of focus. For example, these MAEC Bundles could capture the output of different tools, some data obtained through manual analysis, etc. The term ‘Findings_Bundles’ is used rather than simply ‘Bundles’ to imply that the content was derived from analysis. ◦ Relationships – Captures bi-directional relationships between the Malware Subject of focus and other Malware Subjects. Ex- amples include ’downloaded by’, ’dropped by’, ’downloads’ and ’drops’.

• Grouping Relationship – Specifies the particular relationship be- tween all of the Malware Subjects encompassed in the MAEC Pack- age. Example relationships include ’same malware family’ and ’clus- tered together’ (possibly by a malware analysis clustering algorithm).

4.3.2 High Level Use Cases for the MAEC Language

At its highest level, MAEC is a domain-specific language for non-signature based malware characterization. Because MAEC provides a common vocab- ulary and grammar for the malware domain, it follows that the majority of the use cases for MAEC are motivated by the unambiguous and accurate communication of malware attributes enabled by MAEC. [13] High level use cases for the MAEC language are as follows: [13]

• Malware analysis – MAEC will typically be used to encode the data garnered from malware analysis. In such a scenario, a malware instance is analyzed automatically or manually using either dynamic or static methods. The results are then captured using the MAEC schema and either a single MAEC Package (with one or more MAEC Bundles) or one or more standalone MAEC Bundles are generated to communicate the analysis results. MAEC Packages and MAEC Bun- dles can also be used to help with visualization, to capture data for storage in analysis-oriented repositories, and as a means for standard- izing tool output.

53 4. Signature Formats

• Cyber threat analysis – Beyond analysis of a particular malware instance, an organization defending against cyber adversaries often engages in the broader task of cyber threat analysis -– the collection and analysis of cyber attack and threat information in relation to the organization’s potential vulnerabilities. Cyber threat information includes analysis results of malware instances, along with additional threat data such as intent and kill-chain information and adversary tools, techniques, and procedures. For successful cyber threat analy- sis, detailed analysis information about the malware instances must be obtained. For example, triage procedures may reveal information such as spear-phishing email headers or URLs to malicious websites, while in-depth malware analysis may uncover command and control domain names and IP addresses.

• Intrusion detection – Effective intrusion detection is central to keeping networks safe from malicious actors. Using MAEC to charac- terize malware based on its attributes provides actionable information for malware detection and assessment: more specifically, low-level Ob- jects and Actions and mid-level Behaviors enable malware detection. Unlike a physical signature, a single MAEC characterization, repre- sented by a MAEC Bundle or MAEC Package, can provide data that can be used to detect multiple malware instances. Because there are a finite number of ways of implementing a particular software behavior (for instance, keylogging), particularly at the assembly level, there is likely to be an intersection of such attributes between multiple mal- ware instances.

• Incident management – When a cyber incident occurs, a defending organization must coordinate their response among a team of ana- lysts and decision makers. In some cases, the organization may solicit help from Computer Security Incident Response Teams (CSIRTs), law enforcement, Internet Service Providers (ISPs), or product vendors. Regardless of the underlying threat, when numerous people or par- ties are involved, even within the same organization, effective incident management is extremely important.

4.3.3 Advantages of MAEC format MAEC format is extremely extensible and provides ways for third-party products to be reviewed and registered as MAEC-Compatible software. This kind of attribution is granted after it complies with MAEC content creation,

54 4. Signature Formats storage and consumption, thus providing other companies or individuals with tools that help with creation of the signature files and understanding its ca- pabilities in ways that are easy for the end user. MAEC is linked to OVAL22, CPE23, CVE24 and CWE25 and thus is categorization made even more in- terconnected with other sources and is documented very well. Its primary focus is on distinguishing between similar malware threats and on describing these threats as precisely as possible.

4.3.4 Disadvantages of MAEC format The MAEC language is very complicated and the implementation of this language must be executed very carefully. Another disadvantage of this lan- guage is that it only serves as malware description, yes, it excels at that and it can be used to disambiguate between similar malware properties that oth- erwise could not be separated, however, this specification lacks any malware scanning utilities.

4.4 Comparison of Mentioned Formats

Properties YARA OpenIOC MAEC/MITRE Signatures plain text XML XML Able to scan Yes Yes No Platforms All1 All2 Proprietary No Yes3 No4 1 Written in Python language. 2 No platform requirements for XML files. 3 Paid support and indicator releases. 4 Free for public use.

Table 4.1: Comparison of format properties

YARA format’s strength is mainly in being lightweight, being incredibly flexible (it has support for using regular expressions in scanning, which is a very nifty feature) and the release of new version 2.0 is promising in terms

22. Open Vulnerability and Assessment Language – http://oval.mitre.org/ 23. Common Platform Enumeration – http://cpe.mitre.org/ 24. Common Vulnerabilities and Exposures – http://cve.mitre.org/ 25. Common Weakness Enumeration – http://cwe.mitre.org/

55 4. Signature Formats of faster speed, multi-threading support and own and documented regular expression engine. Its disadvantage is a lesser expression strength, mainly because it is not a proprietary format (it is driven by contributors, not by financial gain) and it can’t be extended as easily as IOC whose rules are defined in an XML format and the user knows every single item it supports26. An ideal format would have the ambitions, properties and the expression strength of MAEC format, scanning capabilities (and tools) of IOC format and the flexibility of YARA format. The differences between formats discussed here can be viewed at the sample signatures in the appendix, a sample IOC signature is available at B, YARA signature at C and MAEC signature at D.

26. As mentioned, all current terms are available at [9]

56 5 A Tool for Malware Signature Conversion

This thesis presents a tool that shows ways and approaches that could be extended for a wider use. It has various requirements in order to be as simple (as in simple by use case, not by its implementation) and platform independent as possible.

5.1 Requirements

The first, most basic requirement is to have a platform independence. There are many languages that are platform independent but not all of them have a great framework support. Then, an easily extendible framework is required. The framework should have many ways to achieve desired effect and it should be a well-known and well-supported. Furthermore from the user’s point of view, the tool should not require any additional software to be installed. Bearing this all in mind, the best outcome of these requirements is to create a web page using Java language and Spring Framework.

5.2 Technology Used

5.2.1 Java The chosen language is Java, which is an object-oriented, platform indepen- dent language. Every piece of java source code is organized in classes which are interconnected and when compiled, a bytecode is produced. This byte- code can be run on any device that has a virtual machine present. This is the only requirement for a Java program to run (aside from some minimum memory, disk and CPU requirements). Any user that wishes to run a Java application must have at least a JRE (Java Runtime Environment) installed. Other option is to have a JDK (Java Development Kit) installed, which is required to develop Java application and which contains a JRE itself.

5.2.2 Spring Framework As previously mentioned, the implementation language is Java. The frame- work used is Spring Framework, which is based on Java language. This frame- work is widely spread and is able to take advantage of new technologies and approaches, such as Dependency Injection27, Aspect-Oriented Program-

27. An approach where a dependency is declared in Spring’s application context file (also, many properties are defined here) and it is independent of other Java code, it is then

57 5. A Tool for Malware Signature Conversion ming28, web services etc.

5.2.3 Server There is no specific server required therefore it can be a simple servlet con- tainer like Apache Tomcat.

5.3 The Tool

The tool’s name is Malware Signature Converter, which is used for converting malware signatures (currently only from YARA to IOC). YARA’s Strings scanning capability is used most frequently, so this tool converts YARA’s Strings scanning signatures to IOC’s FileItem/StringList/string. The algorithm used to parse YARA logical conditions (even nested ones) is the Shunting-Yard Algorithm [14]. The condition is parsed and then trans- lated into a Reverse Polish notation, which has no brackets, instead of using brackets for grouping some values and operands, it groups them using the order in which they are declared. For example, the expression (a + b) * (c + d), which is in standard (infix) notation, the RPN equivalent would be a b + c d + * [14]. The latter form can be evaluated as: Take the two first values (a and b), apply the first ’+’ on them. Then take next two val- ues (c and d) and apply the second ’+’ on these values. Now we have two groups, (a + b) and (c + d) and an operand ’*’ that is applied to them. This postfix notation is then evaluated using a binary tree whose node has either element or operator type. This approach is a use of the divide and conquer algorithm which breaks down the difficult logical expressions into a simple ones and then it easily evaluates them.

injected in Java code only when needed. 28. An interceptor for various events, which can modify these events to the developer’s liking

58 5. A Tool for Malware Signature Conversion 5.4 Internal Format

5.4.1 Workflow Malware Signature Converter takes an input in form of YARA signature. Because this signature is not in XML format, the scanning of plain text file is more complicated than scanning an XML file. Therefore the tool has the capability of scanning YARA signature files included, XML scanning is straightforward. It supports various YARA conditions, which are:

Condition Comment A simple condition means that a named Simple string, e.g. $my_string, will be replaced by its direct value, e.g. "String in a file" A new node will be created, which will All of them contain all strings defined in a rule A new node will be created, which will All of $my_string* contain all strings defined in a rule that have "my_string" prefix Either any or some number of them, e.g. 3 of them, meaning that at least three Some of them strings should be present in the file, any of them equals to 1 of them Similar to Some of them, creates a node Some of ($my_string1, with a specified subset of enumerated, $my_string2, . . . ) comma-delimited strings

Table 5.1: Table of supported YARA conditions

The rule is parsed into an internal format that has approximately the same structure as IOC format. With a continued development of the appli- cation, the format could (and is likely to) change, reflecting required lifecycle changes and requirements. As next step, a custom converter can be imple- mented that translates internal format to a desired output. The internal format is used because objects are very well integrated with XML format and having an intermediate format could help when converting to other for- mats. The drawback is to have the format keep all required information.

59 5. A Tool for Malware Signature Conversion 5.5 How to Run the Tool

The tool can be run using many ways, though at least Java JRE is always necessary. The three ways mentioned here are:

• Using a batch script (Windows only)

• Using a Maven command

• Using a server (Tomcat)

Running Using a Batch Script To run the tool without any prerequisites, the user can run the batch script runApplication.bat which is present on the attached CD (see appendix A).

Running Via Maven To run the tool via Maven a Maven installation is necessary. Maven 29 is an Apache-housed project that deals with simpler dependency declara- tion and supports many aspects, plugins and configurations. Maven instal- lation instructions and its download are located at 30. When properly in- stalled, the project can be run navigating to the project’s main directory of malsigconverter, which is containing the file pom.xml, and then exe- cuting command mvn tomcat7:run. The application then can be reached at http://localhost:8080/malsigconverter.

Running Using a Server To have the tool up and running on the server of user’s choice (Tomcat is described here), the user must properly load the required .war file to the server. In case of Apache Tomcat server 31, the required .war file must be copied to apache_home\webapps directory, e.g.

C:\servers\apache-tomcat-7.0.47\webapps\malsigconverter.war

Again, the application then can be reached at http://localhost:8080/ malsigconverter.

29. http://maven.apache.org/ 30. http://maven.apache.org/download.cgi 31. Can be downloaded at http://tomcat.apache.org/download-70.cgi

60 5. A Tool for Malware Signature Conversion 5.6 A Simple Use Case

A simple use case is as follows. Consider we have a YARA signature (named apt1 RARSilent.yara) like this32: private rule APT1_RARSilent_EXE_PDF { meta: author = "AlienVault Labs" info = "CommentCrew-threat-apt1"

strings: $winrar1 = "WINRAR.SFX" wide ascii $winrar2 = ";The comment below contains SFX script commands" wide ascii $winrar3 = "Silent=1" wide ascii

$str1 = "Setup=[\s\w\"]+\.(exe|pdf|doc)" wide ascii $str2 = "Steup=\"" wide ascii condition: all of ($winrar*) and 1 of ($str*) } rule APT1_known_malicious_RARSilent { meta: author = "AlienVault Labs" info = "CommentCrew-threat-apt1"

strings: $str1 = "Analysis And Outlook.doc\"" wide ascii $str2 = "North Korean launch.pdf\"" wide ascii $str3 = "Dollar General.doc\"" wide ascii $str4 = "Dow Corning Corp.pdf\"" wide ascii condition: 1 of them and APT1_RARSilent_EXE_PDF }

32. Signature taken from [15] and modified; $str1 was representing a regular expression string that is not supported by IOC.

61 5. A Tool for Malware Signature Conversion

There are two rules. Rule APT1_RARSilent_EXE_PDF is defined to have all strings with prefix $winrar, i.e. $winrar1, $winrar2 and $winrar3 and simultaneously at least one of $str1 and $str2. The second rule, APT1_known_malicious_RARSilent, should have one $str1, $str2, $str3 or $str4 from the second rule and the whole APT1_RARSilent_EXE_PDF rule present. To use this tool, a user first needs to upload a YARA signature file (fig. 5.1). Then the user checks all the signatures he wants to convert to other format, he can select this destination format from the dropdown menu. A conversion is then started by hitting the Convert button (fig. 5.2)33. All conversions are listed in Conversions page (fig. 5.3) and the user can either view them (fig. 5.4) or download them in a .zip file. The original file used for conversion is also copied and an .orig suffix is added indicating the original file. The tool automatically creates as many file with separated rules as there are YARA rules in the original signature file. The user can either view each rule or simply download them.

Figure 5.1: Uploading a file

The resulting IOC indicator can be viewed at listing B.

33. The ’Also delete source file’ button deletes the uploaded file, not the file used for upload

62 5. A Tool for Malware Signature Conversion

Figure 5.2: Converting a file

Figure 5.3: Listing all created files

63 5. A Tool for Malware Signature Conversion

Figure 5.4: Viewing a single conversion entry

The tool is storing all files the user uses in folder

%USER_HOME%\AppData\Roaming\MalsigConverter

e.g.

C:\Users\User\AppData\Roaming\MalsigConverter

on Windows machine. There are folders conversions and upload, where every user is defined by session id and then by a folder name created with a timestamp. To fully clear any files made by the tool a simple deletion of MalsigConverter folder is sufficient.

64 6 Conclusion

This thesis has dealt with a malware signature conversion and a motivation that was needed to understand the importance of malware classification and detection. At first it presented various malware types and their properties, their key characteristics and attackers’ attack patterns. It also described various persistence techniques and methods for detecting these covert ap- proaches to being persistent and launching malicious code. The comparison of the formats was aimed at the basic properties and differences between these formats and the best solution for malware detection is Mandiant’s IOC signature format, because of it’s extensibility, support, scanning and reporting capabilities. The thesis also presented a tool that is implementing basic functionality that is needed for converting between malware signature formats. Further improvements could be done in terms of more supported expressions and bi-directional conversion.

65

Bibliography

[1] Lim, Richard. Cybersecurity Spending Rises Across Agency 2013 Budget Requests. http://news.clearancejobs.com/2012/02/28/ cybersecurity-spending-rises-across-agency-2013-budget- requests/, 2012. [Online; 2014-01-03].

[2] Mandiant Corporation. APT1: Exposing One of China’s Cyber Es- pionage Units. http://intelreport.mandiant.com/, 2013. [Online; 2013-12-21].

[3] Sikorski, Michael; Honig, Andrew. Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software. No Starch Press, 2012. ISBN: 978-1593272906.

[4] Ligh, Michael Hale; Adair, Steven; Hartstein, Blake; Richard, Matthew. Malware Analyst’s Cookbook and DVD: Tools and Techniques for Fighting Malicious Code [With DVD]. Wiley Publishing, 2010. ISBN: 978-0470613030.

[5] Mandiant Corporation. Research: Mandiant ApateDNS. https://www.mandiant.com/resources/download/research-tool- mandiant-apatedns, 2011. [Online; 2013-12-20].

[6] Hungenberg, Thomas; Eckert, Matthias. INetSim: Internet Services Simulation Suite. http://www.inetsim.org/, 2013. [Online; 2013-12- 20].

[7] Wireshark Foundation. About Wireshark. http://www.wireshark.org/ about.html, 2013. [Online; 2013-12-20].

[8] Mandiant Corporation. An Introduction to OpenIOC. http: //openioc.org/resources/An_Introduction_to_OpenIOC.pdf, 2013. [Online; 2013-12-21].

[9] Mandiant Corporation. Current IoC Terms. http://openioc.org/ terms/Current.iocterms, 2013. [Online; 2013-12-21].

[10] Alvarez, Victor M. YARA User’s Manual. https://github.com/ plusvic/yara/releases/download/v1.7.1/YARA.User.s.Manual.pdf, 2013. [Online; 2013-12-21].

67 6. Conclusion

[11] Alvarez, Victor M. YARA User’s Manual. https://github.com/ plusvic/yara/releases/download/v2.0.0/YARA.User.s.Manual.pdf, 2013. [Online; 2014-01-05].

[12] Alvarez, Victor M. YARA User’s Manual. https://github.com/ plusvic/yara/blob/master/README.md, 2013. [Online; 2014-01-05].

[13] MITRE Corporation. The MAEC Language Version 4.0.1 Spec- ification. http://maec.mitre.org/language/version4.0.1/ MAEC_Language_Specification_11-15-2013.pdf, 2013. [Online; 2014-01-03].

[14] Reed, Nathan. The Shunting-Yard Algorithm. http: //www.reedbeta.com/blog/2011/12/11/the-shunting-yard- algorithm/, 2011. [Online; 2014-01-03].

[15] jaimeblasco. Improved CommentCrew yara rules. https: //github.com/jaimeblasco/AlienvaultLabs/blob/master/ malware_analysis/CommentCrew/apt1.yara, 2013. [Online; 2014-01- 01].

68 A Contents of the Attached CD

The attached CD contains files and folder of the following structure

• application – a folder containing the MalSigConverter application

• apache-tomcat-7.0.47 – a folder containing Apache Tomcat server with the deployed application • java – a folder containing a 32-bit Java Runtime Environment 7 • src – a source file of the application • malsigconverter.war – a .war file of the application (for server deployment) • runApplication.bat – Windows batch file for running the appli- cation

• images – a folder containing images used in this thesis

• samples – a folder containing samples of malware signatures used in making of this thesis

• tex – a folder containing source TEXfile of this thesis

• nemcek_mgrthesis.tex – source TEXfile of this thesis • nemcek_mgrthesis.pdf – this PDF

69

B Mandiant’s OpenIOC signature format

APT1_known_malicious_RARSilent CommentCrew-threat-apt1 AlienVault Labs 2014-01-02T15:59:14.356+01:00 Dow Corning Corp.pdf\" Dollar General.doc\" North Korean launch.pdf\" 71 Analysis And Outlook.doc\" B. Mandiant’s OpenIOC signature format < Context document = " FileItem search = " FileItem / StringList string < Content type = " mir / > type = " string > Steup =\& quot ; < / Content < Context document = " FileItem search = " FileItem / StringList string < Content type = " mir / > type = " string > Setup =[\ s \ w \& quot ;]+\.( exe | pdf doc ) < / Content < Context document = " FileItem search = " FileItem / StringList string < Content type = " mir / > type = " string > Silent =1 < / Content < Context document = " FileItem search = " FileItem / StringList string < Content type = " mir / > type = " string > ; The comment below contains SFX script commands < / Content > < Context document = " FileItem search = " FileItem / StringList string < Content type = " mir / > type = " string > WINRAR . SFX < / Content < IndicatorItem condition = " contains id = " eaad9c06 -3 dc7 -4 e4e - bdbe -462929 fa24b6 > < / IndicatorItem > < IndicatorItem condition = " contains id = " 9 a905870 -2017-40 e9 - b240 -8 b4b8f878540 > < / IndicatorItem > < / IndicatorItem > < IndicatorItem condition = " contains id = " afa6f10f -767 f -47 ff - a60c -5 a2abedc2b0a > < / IndicatorItem > < IndicatorItem condition = " contains id = " 322 bfdc9 - ea1f -4052- a12c -4 f3beaaed17a > < / IndicatorItem > < IndicatorItem condition = " contains id = " be5d2de1 -7 fcd -48 bd - a778 c8f710bbfad4 > < / IndicatorItem > < Indicator operator = " or id = " 8 fb2dbb3 -904 e -44 d6 -858 d -2 a66c000cae8 > < / Indicator > < Indicator operator = " and id = " db21369f -1 df5 -4 ee6 - bb5e -6 b717f251f8a > ... [ closing tags omitted ] ... < / ioc >

72 C YARA signature format rule COMBOS_APT1 { meta: author = "AlienVault Labs" info = "CommentCrew-threat-apt1"

strings: $s1 = "Mozilla4.0 (compatible; MSIE 7.0; Win32)" wide ascii $s2 = "Mozilla5.1 (compatible; MSIE 8.0; Win32)" wide ascii $s3 = "Delay" wide ascii $s4 = "Getfile" wide ascii $s5 = "Putfile" wide ascii $s6 = "---[ Virtual Shell]---" wide ascii $s7 = "Not Comming From Our Server %s." wide ascii

condition: all of them }

73

D MAEC signature format

MD5 < / cyboxCommon Type < cyboxCommon : Simple_Hash_Value > < / cyboxCommon : Simple_Hash_Value > < cyboxCommon : Type xsi : type = " cyboxVocabs HashNameVocab -1.0 > SHA1 < / cyboxCommon Type < cyboxCommon : Simple_Hash_Value > < / cyboxCommon : Simple_Hash_Value > < cyboxCommon : Hash > < / cyboxCommon : Hash > < cyboxCommon : Hash > < / cyboxCommon : Hash > http :// cybox . mitre org / XMLSchema default_vocabularies /2.0.1/ cybox_default_vocabularies xsd http :// cybox . mitre org / objects \# WinExecutableFileObject -2 http :// cybox . mitre org / XMLSchema objects Win_Executable_File /2.0.1/ http :// cybox . mitre org / objects \# ArtifactObject -2 http :// cybox . mitre org / XMLSchema objects Artifact /2.0.1/ Artifact_Object xsd " id = " maec - example bnd -1 schema_version = " 4.0.1 defined_subject = " true content_type = " dynamic analysis tool output " > < FileObj : Size_In_Bytes > 24840 / < FileObj : Hashes > < / FileObj : Hashes > < cybox : Properties xsi : type = " WinExecutableFileObj WindowsExecutableFileObjectType > < / cybox : Properties > < maecBundle : Malware_Instance_Object_Attributes >

76 D. MAEC signature format < ![ CDATA [11 DF0100 ... [ omitted ] ... EC43F ]] > < ArtifactObj : Raw_Artifact > < / ArtifactObj : Raw_Artifact > < cybox : Properties xsi : type = " ArtifactObj ArtifactObjectType type = " Network Traffic " > < / cybox : Properties > < maecBundle : Object id = " maec - example obj -1 > < / maecBundle : Object > < / maecBundle : Malware_Instance_Object_Attributes > < maecBundle : Objects > < / maecBundle : Objects > < / maecBundle : MAEC_Bundle >

77