Bachelor Degree Project

What’s the deal with Stegomalware? - The techniques, challenges, defence and landscape.

Author: Kristoffer Björklund Supervisor: Ola Flygt Semester: VT 2021 Subject: Computer Science Abstract Stegomalware is the art of hiding malicious software with steganography. Steganog- raphy is the technique of hiding data in a seemingly innocuous carrier. The occurrence of stegomalware is increasing, with attackers using ingenious techniques to avoid detection. Through a literature review, this thesis explores prevalent techniques used by attackers and their efficacy. Furthermore, it investigates detection techniques and defensive mea- sures against stegomalware. The results show that embedding information in images is common for exfiltrating data or sending smaller files to an infected host. Word, Excel, and PDF documents are common with phishing emails as the entry vector for attacks. Most of the common Internet protocols are used to exfiltrate data with HTTP, ICMP and DNS showed to be the most prevalent in recent attacks. Machine learning anomaly-based detection techniques show promising results for detecting unknown , however, a combination of several techniques seems preferable. Employee knowledge, Content Threat Removal, and traffic normalization are all effective defenses against stegomal- ware. The stegomalware landscape shows an increase of attacks utilizing obfuscation techniques, such as steganography, to bypass security and it is most likely to increase in the near future. Keywords: Stegomalware, steganography, information hiding, covert channel Preface I want to give a big thank you to Ola who supervised and helped me throughout my work on my thesis. Without his guidance and help, this would not have been possible. I also want to thank my friend Niko who always was there to discuss ideas and to give support during this process. Finally, I of course want to express my appreciation of the support and words of encouragement I have received from my family and partner. Contents

1 Introduction1 1.1 Background...... 1 1.2 Related work...... 2 1.3 Problem formulation...... 3 1.4 Motivation...... 3 1.5 Scope/Limitation...... 3 1.6 Target group...... 3 1.7 Outline...... 5

2 Method6 2.1 Literature review...... 6 2.2 Reliability and Validity...... 7 2.3 Ethical considerations...... 7

3 Theoretical Background8 3.1 Steganography...... 8 3.1.1 Image Steganography...... 9 3.1.2 Network Steganography...... 11 3.1.3 Digital Media Steganography...... 13 3.2 Effectiveness...... 13 3.3 Steganalysis...... 14 3.4 Malware...... 16 3.5 Malware detection...... 17 3.6 Advanced Persistent Threats...... 17 3.7 Command and Control...... 18

4 Results 20 4.1 What are the most prevalent techniques used for stegomalware and how ef- fective are they?...... 20 4.1.1 Digital media files...... 20 4.1.2 Covert channels...... 21 4.1.3 How effective are these techniques?...... 25 4.1.4 Effectiveness of digital media steganography...... 25 4.1.5 Effectiveness of covert channels...... 27 4.2 The main challenges that arise when trying to detect stegomalware...... 29 4.2.1 Challenges when detecting hidden information in digital media.... 29 4.2.2 Challenges when detecting covert channels...... 29 4.2.3 Stegomalware detection techniques...... 30 4.3 Defending against stegomalware...... 30 4.4 The stegomalware landscape today...... 31

5 Analysis 34 5.1 What are the most prevalent techniques used for stegomalware and how ef- fective are they?...... 34 5.1.1 Digital media files...... 34 5.1.2 Covert Channels...... 35 5.1.3 Effectiveness of digital media steganography...... 35 5.1.4 Effectiveness of covert channels...... 36 5.2 What are the main challenges in detecting stegomalware hidden with these techniques?...... 37 5.2.1 Challenges when detecting hidden information in digital media.... 37 5.2.2 Challenges when detecting covert channels...... 38 5.2.3 Stegomalware detection techniques...... 38 5.3 What is the state of the art when it comes to defending against stegomalware? 39 5.4 How common is the use of stegomalware in modern attacks?...... 39

6 Discussion 41

7 Conclusion and Future Work 43 7.1 Scientific contribution...... 43 7.2 Future Work...... 43

References 45 1 Introduction

The occurrence of malware is on the rise in our digital landscape [1]. Steganography, in the digital realm, is the science of hiding information in digital structures [2]. This has given rise to hiding malware with the use of steganography to avoid detection. More sophisticated cyberattacks have become more common as a result [3]. This has led to the EU funding projects such as CUING(Criminal Use of Information Hiding) and SIMGARL(Secure Intel- ligent Methods for Advanced Recognition of Malware and Stegomalware). CUING reports that between 2011 and 2019, the incidents that involve the use of stegware has increased [4]. Stating that it is likely that there is more stegware that is yet to be discovered. The aim of this thesis is to investigate the techniques used to hide stegomalware, how to detect and defend against the threat as well as to investigate the current landscape of stego- malware. To achieve this, a literature review was performed using peer-reviewed articles, blog posts, news articles, threat reports and malware analysis documents. The results showed that images, Word, Excel and PDF documents are the most common carriers when it comes to digital media files. Data embedded in network traffic was most found to be in HTTP and DNS traffic, however ICMP, TCP and UDP was also used. The human factor, the use of common protocols and the need for a combination of detection tech- niques were identified as challenges for detecting stegomalware. Defending against stegoma- lware required a combination of multiple factors such as trained employees and updated sys- tems. The current landscape showed several newly reported attacks that use steganographic techniques to hide malware.

1.1 Background Steganography literally means "covered writing/secret writing”, derived from the Greek word Steganographia [5]. It is the process of concealing information in a seemingly innocuous cover media. It’s far from a new concept but has found new use case in the cyber world as a key feature to obfuscate malicious software. Such software, with malicious intent is called malware and has been around since the 80’s. It is commonly developed by cybercriminals to leak sensitive information, damage computer systems or steal banking information for example. Malware is a term used to summarize viruses, worms, spyware and to name a few. In recent times, with the rise of more sophisticated malware, the use of steganography has also increased. Combining steganography and malware has given rise to a new threat. Stegomalware/stegware is a term used for malware that has been hidden with the use of steganographic techniques. Since the malware is hidden within another media, traditional anti-malware software can not detect it. There are a few different techniques for hiding malware to avoid detection. Mal- ware can for example be hidden in the pixels of an image, included in document macros or be embedded in network packets. Although steganography can be combined with malware for malicious purposes, steganog- raphy in its self can be used with good intent as well. It is a useful tool in countries where free speech is not a given right, to be able to communicate freely. Steganography can also be used by journalists reporting news from countries where the government might want to censor what is reported.

1 1.2 Related work The EU funded project SIMARGL categorises stegomalware into three different groups [4]. Group 1 is defined as malware that uses some kind of digital media file as the carrier for secret data. Group 2 is defined as malware that embeds secret data by modifying the file structure of a digital image. Lastly, the third group is defined as malware that utilises network traffic and injects the secret data there. Mazurczyk and Caviglione [6] also brings up different group classifications. They have two similar groups, modifying a digital image and injecting data into network traffic. How- ever, their third group classification includes methods that modulate shared hardware/software resources to hide information. An overview of current malware, including information hiding techniques, as well as methods for detecting are mentioned in [7]. In [7] they mention how steganography can be used to hide malware as well as how steganography can be used together with ransomware and exploit kits to make them more advanced. Using a digital media file as a carrier has been used in several reported cyberattacks [8]. The most popular type of file being a digital image. This technique can be used for many dif- ferent purposes, for example to send a URL to malware that has already infected a computer which can be used to download additional code or instructions. The whole malicious code can also be sent in this manner [8]. Exploit kits are automated threats that can be anything from ransomware to rootkits [9]. Combining steganography techniques with exploit kits has become a method to allow users of an exploit kit to avoid detection more easily. In 2016, the first exploit kit to implement information hiding was Stegano/Astrum exploit kit [7]. This was used to embed malware into ads on websites. This is referred to as "malvertising". Hiding information within legitimate network traffic for example is referred to as covert channels/network steganography [10, p. 41-44]. These two definitions are used interchange- ably, which will be discussed in later chapters. Several different techniques have been pro- posed to use different protocols such as TLS [11] and IPv6 [12]. In 2003, Szczypiorski pro- posed a new method that would intentionally use the wrong checksums within transmission frames for covert communication [13]. [14] presents a technique to detect hidden communication within covert channels. Using histograms of time relations between IP packets, attacks using network steganography can be detected. [15] shows a way to detect payload that has been embedded into a PDF document using a tool called OpenPuff. By looking for certain hexadecimal values, they show that with a short script, they can confidently detect the presence of hidden information. In [16] they present a detection method involving the Linux kernel. By leveraging the extended Berkeley Packet Filter(eBPF), the authors can monitor and trace the behavior of software processes and network traffic. They conduct two tests, one with colluding applica- tions and another test with a covert channel implemented in IPv6 traffic. Mobile malware has reportedly increased [1]. In the case of smartphones local covert channels can be set up between two colluding applications with the intention of stealing personal information. In [17] the authors present a method for detecting these covert channels by measuring the energy consumed by the device. The tests performed by used two different detection methods based on artificial intelligence.

2 1.3 Problem formulation The field of stegomalware research is growing. There is therefore quite a lot of research available online. Articles have given summaries of steganography techniques and different stegomalware methods [7, 18], however these have given more of a birds eye view. To the best of my knowledge, no report has given a summary in high detail nor talked about the efficacy of different techniques and the main challenges arise when it comes to the detection of stegomalware using these techniques. Defending oneself against stegomalware goes hand in hand with the detection of stegoma- lware. However, this is not discussed much in the literature. Moreover, attacks mentioned in peer-reviewed articles are often several years old and no peer-reviewed article I found during research has discussed the occurrence of stegomalware in the newest attacks reported.

What are the most prevalent techniques used for hiding stegomalware and RQ1 how effective are they? What are the main challenges in detecting stegomalware hidden RQ2 with these techniques? What is the state of the art when it comes to defending against RQ3 stegomalware ? RQ4 How common is the use of stegomalware in modern attacks?

1.4 Motivation The occurrences of malware is increasing [1] and the usage of steganographic techniques in combination with malware is being spotted more often in the wild [4]. Since this trend is most likely only going to increase as the cyberattacks become more sophisticated, the need for security personnel to be informed of the modern landscape becomes more important. The aim of this thesis is to help inform the target group and others of the threat that is stegomalware in an attempt to give this threat the attention that is required when it comes to security. An arms race is happening right now between the developers of malware and the cyber security community [7]. Falling behind when it comes to detecting and defending against stegomalware may lead to devastating events.

1.5 Scope/Limitation In this thesis I limit myself to malware obfuscated with steganographic techniques and not other types of obfuscation techniques seen in [19] for example. The types of steganographic techniques most widely used and those that will be discussed in this thesis are Digital me- dia steganography(image, text, audio, video for example) and network steganography(covert channels).

1.6 Target group This research will focus on analyzing the various methods and techniques used for stegoma- lware and what challenges are involved with the detection of such techniques. Moreover, the defensive strategies against stegomalware will be discussed as well as the roll stegomalware

3 plays in modern cyberattacks. This will be achieved with a literature review. This thesis aims to provide IT-security professionals with information about how an attacker may use these techniques to avoid detection upon intrusion or once inside their systems or network. More- over, this thesis can be ground for researchers within the IT-security community as well as developers for software that can detect this type of threat.

4 1.7 Outline The rest of this thesis is organized as follows. In Chapter 2 the methodology chosen, research method and ethical considerations are introduced and discussed. Chapter 3 introduces the theoretical background, discussing all theory behind the concepts needed for understanding the area of stegomalware. Chapter 4 discusses the results found from the literature review, with each research question answered in order. In Chapter 5 analysis of the results presented in Chapter 4 is performed. Chapter 6 discusses areas of interest and observations made during work on this thesis. In Chapter 7 results are quickly summarized and concluded and possible Future Work is introduced.

5 2 Method

This chapter will bring up the scientific method that was chosen to answer the research ques- tions. This thesis used a literature review.

2.1 Literature review A literature review was performed to answer the research questions for this thesis. There are articles that describe the common techniques used to hide malware with the use of stegano- graphic techniques, however, to the best of my knowledge, no article describes these tech- niques as well as their effectiveness in a single report. The articles were chosen if they were from trusted databases. Blog posts and news articles were used in this thesis if what was claimed could be confirmed from several sources to make sure that no misinformation was presented. Moreover, Threat reports were included if they were from well known organizations within the IT-security community. The same reasoning went into choosing the malware analysis reports. The table below shows the types of sources used as well as how many of each are included in this thesis.

Peer-reviewed articles 61 Blog posts/ News articles 25 Threat reports 7 Malware analysis 6

Several databases were searched when looking for articles for this thesis. They were as followed, Google Scholar, Web of Science, Science Direct, Research Gate, IEEE, ACM Dig- ital Library, SIMARGL(Secure Intelligent Methods for Advanced Recognition of Malware and Stegomalware) project and CUING(Criminal Use of Information Hiding). The search terms revolved around "stegomalware", "stegomalware detection" and "stegomalware tech- niques". Articles that research information hiding techniques for malware but did not use stegano- graphic techniques were excluded since they use other methods to obfuscate their existence and would therefor be outside of the scope of stegomalware. Since it takes time for peer reviewed articles to get published, it is hard to find articles that mention the latest news. To tackle this information gap, blog posts and news articles were also searched to find the newest cyberattacks that involve stegomalware. These were found by normal Google searches however, the source chosen was also checked to make sure that no misinformation was presented. Furthermore, threat reports were used to be able to provide information about current and recent landscapes. Malware analysis publications were also used to gain a deeper understanding of how certain attacks worked. All sources used were in English. The reason for not doing a systematic literature review for this work is because the area in which to find information would most likely be too small. Being able to search in many places with blog posts and news articles as available information allows this work to consider information provided not only from peer reviewed articles when it comes to recent cyberat- tacks for example.

6 2.2 Reliability and Validity Following and performing searches as described in the previous section, the same information and results will be presented giving this thesis reliability on the results of the literature review. When looking at blog posts and new articles, I always made sure to find other posts that state the same things that another post may have stated. This is to limit the risk of a blog post or news article to post misleading or false information. This is a bigger risk than when reading peer reviewed articles of course. Presentations where also used a references but were only used if they were from known researchers within the field of stegomalware research or other trusted sources.

2.3 Ethical considerations When it comes to the ethical considerations for this thesis, no practical work with malware is displayed which may make it seem like this discussion is not that significant. However, when it comes to the subject of talking about malware and stegomalware specifically, there is a discussion to be made. Even if no implementation is proposed, there are articles used a references that have implemented different techniques and is therefore subject to the ethical consideration. The information provided in this thesis is for education purposes only and any use of stegomalware with malicious intent is illegal. When proposing a new method for hiding malware with steganographic techniques and presenting an implementation for this, the ethical thing would be to along with the imple- mentation also provide ways of detecting the use of the proposed technique. As stated, no experiments have been done for this thesis. There is however a discussion made around implementing tools to detect stegomalware, specifically involving GDPR com- pliance [20]. A point that is brought up is that when detecting malware/stegomalware, the data being processed by the tools is most likely also private/personal data. This means that processes or elements that are needed for companies to be GDPR compliant, is also affected by the same law. This is of course only applicable to states or companies that are within or handle information about citizens residing in the EU.

7 3 Theoretical Background

This chapter will introduce the theory needed to understand the area of stegomalware. Steganog- raphy and the different types of steganography will first be introduced. Image steganography, although part of digital media file steganography, will be discussed separately since it is a more common and more widely discussed technique. Lastly steganalysis followed by Mal- ware, Advanced Persistent Threats(APTs), Command & Control and Effectiveness will be introduced.

3.1 Steganography Information hiding is a term that both encompasses steganography and watermarking [21]. Steganography and watermarking can sometimes be hard to differentiate since they share many similarities. Watermarking is often used to verify the ownership and authenticity of a digital image for example. The focus of this section and this thesis is however on steganog- raphy. Note the figure below (Figure 3.1) how the two areas are separated. It will help to differentiate watermarking and steganography.

Figure 3.1: Image recreated from [21]

Steganography is the art of hiding information. When it comes to the digital world, infor- mation is hidden in files or other digital structures [2]. As seen in the figure above, it is the realm of technical steganography that is discussed in this thesis. However, steganography is nothing new. It has been used since ancient Greece where they would send secret messages with wax-covered tablets. They would scrape of the wax, write a message and then apply the wax once more to make the tablet look unused [5]. A carrier in the case of digital steganography can for example be an image, a PDF document, network packet or an audio file. Looking at the example in ancient Greece with wax tablets, then

8 the tablet would be seen as the carrier. Unlike cryptography which hides the context of a message, steganography is all about hiding the existence of the message itself. Even if one were to send an encrypted message, the fact that the message is encrypted would raise suspi- cion and make the third party aware that secret or sensitive information is being sent. If one were to send information using a steganographic technique, if done correctly, the third part would never know that sensitive or secret information was sent i the first place. The goal of steganography is security through obscurity [2]. It is also possible to use a combination of steganography and cryptography(also possibly compressing the message so that it will take up far less space in the carrier). To do this one would have a message, encrypt it and then embed it. If a third party in this case were to intercept the file where the information is embedded, as well as be able to extract the information, then it would still be encrypted and would not be able to be read by the third party. A few components are apart in the creation of a stego-carrier (carrier media with a steganographic payload). As stated by Johnson and Jajodia [22], a cover media and the embedded message creates a stego-carrier. The use of a key, or stegokey in this case, might also be used to add an additional layer of security. This may be a shared secret such as a password. They also show a simple formula for how this may look:

Cover medium + embedded message + stegokey = stego-medium [22]

This basic formula is applicable to any type of steganographic technique. This will be discussed in more detail in the following sections.

3.1.1 Image Steganography Image steganography, as the name would suggest, is using an image as the carrier to hide information. Using images is the most popular choice of carrier media [23]. If applying the formula provided above to image steganography the resulting stego-medium would in this case be a stego-image. This will be used in this section to describe an image after information has been embedded. The field of image steganography is very wide. It includes many different techniques when it comes to how the information is embedded into an image. Image steganography can be split into different groups i.e. Spatial/Image Domain, Transform Domain, Spread Spectrum and Model based steganography [22, 24, 25, 26]. Some articles have divided or included more groups than others, however only spatial and transform domain techniques will be discussed in this section as they are commonly used in steganography tools. There are specific techniques that are within the spatial and transform domain. Starting with the transform domain steganography, information is hidden by altering the coefficients in the frequency domain. This can for example be Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) or Discrete Wavelet Transform (DWT) [27]. Moving on to the spatial domain one of the most widely discussed techniques is Least Significant Bit(LSB) steganography. LSB methods are easy to implement which leads to them often being used in free steganography tools that you can find online [28]. The basic idea behind LSB steganography is embedding the desired message by changing the least significant bit of the color channels in each pixel of an image. If this method is correctly

9 used, the changes to the image will be imperceptible to the human eye. There are two types of LSB steganography: LSB replacement and LSB matching. LSB matching is deemed more secure than LSB replacement, i.e. it is harder to detect the presence of embedded information if it is hidden using LSB matching. [28]. No matter if LSB matching or LSB replacement is chosen, the desired message is con- verted into a stream of bits and it is those bits that then are embedded. The way that LSB matching works is that you take each pixel of the cover image (also possible to use a shared secret key in order to pick each pixel in a pseudo-random order), then you check if the LSB of the chosen pixel matches the next bit of the message. If it is a match you do nothing and continue the process. If the bits do not match you randomly add or subtract one from the pixel of the cover image. If the size of the message contain fewer bits than the number of pixels, the changes will be spread evenly across the entire image. LSB replacement however simply overwrites the LSBs of the cover pixel [28]. Looking at an RGB pixel, it consists of 3 bytes, where 1 byte represent a color. For example if a pixel has the value of 0,0,255 then the pixel is entirely blue. Looking closer at each byte it looks like this:

Figure 3.2: Representation of a byte

The numbers on top represents the value of each bit. When a pixel has the value 0,0,255 then the blue pixel has all values as 1 as shown in Figure 3.2. The first bit, with value 1, is the least significant bit and that is what is changed when performing LSB steganography. Since there are 3 bytes per pixel, this means we can modify 3 bits per pixel without there being a perceptible change to the image. Looking at the two squares of blue in Figure 3.3 there is no noticeable difference between them.

10 (a) 0,0,255. (b) 0,0,254.

Figure 3.3: Blue color with 1 bit difference.

When choosing an image to embed a message in, it should be chosen wisely. Taking an image from Google or other places online is not the smartest idea because if the stego-image is captured and the image used as the carrier is found as well, then it is possible to compare them when performing steganalysis. Steganalysis will be discussed in a later section. A personally taken image is the best way to go and then to discard the cover image to make sure that it can not be uncovered later on. Since images are widely available and often posted online on places like or other image sharing websites, it is easy to upload a stego-image intended for a specific recip- ient without raising suspicion.

3.1.2 Network Steganography Before going into detail about network steganography, a discussion is to be made about the term network protocol steganography and the term covert channels. As stated in [10, p. 42], the distinction between the two terms are not well grounded. Covert channels are defined as communication channels that were not intended nor designed to be used to transfer infor- mation at all. Covert channels were originally defined by Lampson [29]. One thing to be said is that steganography and covert channels do not describe or encompass different hiding techniques. It is the evolution of the data carrier used that distinguish them. Hiding the use of a specific protocol with another protocol is also referred to as tunneling. The two terms mentioned can be described both as network steganography. The definition and use of the term network steganography in this thesis will follow the one given in [10, p. 41] i.e. " that network steganography techniques, as other steganography techniques, create covert (steganographic) channels for hidden communication, but such covert channels do not exist in communication networks without steganography (only the possibility for such channels exist a priori)". Much like image steganography where the aim is to hide a secret message without sig- nificantly altering the carrier, network steganography aims to hide information in normal

11 transmission without significantly altering the carrier. Since network packets cannot be seen by humans in the way images can for example, it is designed to mainly fool network devices so that is it not to be detected. In the field of network steganography, there are carriers and subcarriers.A carrier is an overt traffic flow that goes between the sender and receiver of the hidden information. A carrier can poses many places in where to hide information. This is what is called a subcarrier and is a "place" or timing of certain "events" of a carrier. It can for example be padding, header fields or even specific sequence of packets. Subcarriers are usually based on timings or storage to act as the covert channel [10, p. 45]. Sending secret information with the use of specific packets can be to used different pack- ets to signify bits. For example one might use the UDP protocol to signify "1" and ICMP to signify "0". Looking at Figure 3.4, sending the packets in this order would send the bits "110101". It is also possible to encode information like this by using multiple TCP connec- tions for example and use timings of packets in the different streams to signify "1" or "0" [10, p. 46-47].

Figure 3.4: Transmission using different protocols

There are three features that can be used to characterize network steganography meth- ods. They are steganographic bandwidth, undetectability and robustness. Steganographic bandwidth defines how much data can be sent per time unit (depends on the method). Un- detectability defines if the third party or an adversary is able to detect the steganographic payload within the carrier. Lastly, robustness defines the amount of alteration that the secret message can take without being destroyed or the ability for a channel to persist through fil- ters for example [10, p. 48]. These characteristics can also be applied to other steganography methods. As mentioned before, it is possible to use different fields in protocols to hide information, for example in the HTTP header fields, TCP header fields or in different IPv6 header fields [12]. This might make one think about why we don’t design "steganography-free" protocols. It is hard as well as very impracticable since it would unreasonably limit the extensibility and/or the functionality of the protocol [10, p. 46]. Network steganography can be achieved using the most simple protocols. However using more sophisticated or complex protocols will often present more opportunities to hide information.

12 3.1.3 Digital Media Steganography This section will briefly discuss steganography using other digital media such as audio files and PDF documents. Unlike image or video steganography that exploits the limitations of the Human Vi- sual System (HVS), audio steganography exploits a property of the Human Auditory Sys- tem(HAS), called the masking effect. This effect is that large-amplitude stimulus makes it so that we are less sensitive to smaller stimuli [30]. Like image steganography there are tech- niques that reside in different domains like the temporal and transform domain. Furthermore, much like image steganography, there is LSB steganography within this area as well. Taking an audio sample of 16 kHz you are able to embed 16 kbps [31]. Text steganography is the technique of hiding information in a text document, such as a Word document, and is considered hard since a text document does not contain much redun- dant information [32]. Small alterations can also have a large impact on the visual aspect of a file. There are three main categories that can be pulled from text steganography and they are based on the embedding technique used. First, character-level embedding will directly embed secrets using different characters inside the text document. This techniques also has two sub-categories, cover document necessary and character making. Secondly, Bit-level embedding is similar to the types of bit level embedding mentioned in earlier sections. With this method, you take the message, convert it into bits by for example taking the binary representation of the ASCII values and then embed it into the document. Lastly, mixed-type embedding takes a combination of the two aforementioned methods, by for example converting a message into bits, mapping the bits to letters and then embedding using character-level embedding. The Portable Document Format (PDF) was developed by Adobe. Since it is a very pop- ular document format, it has become used in the area of steganography. In [33] they present a method of using the ASCII value of Ao that becomes invisible to usual PDF readers. They proposed two different techniques, where you can either embed this between words or be- tween characters in the PDF document. [34] brings up that the second technique increases the size of the PDF document compared to the original which is an obvious disadvantage. Many different techniques and methods exist when it comes to PDF steganography and there are also tools that use PDFs such as OpenPuff. OpenPuff is however seen as insecure accord- ing to [15].

3.2 Effectiveness The effectiveness of specific steganographic techniques can be classified in different ways. This was briefly touched upon in Section 3.1.2 earlier. Generally when implementing dif- ferent steganographic techniques there are a few main aspects to look out for. These are robustness, undetectability and steganographic bandwidth/hiding capacity. Robustness means that a stego-object can withstand alterations without loosing or de- stroying the embedded payload. If an image with a secret message is exposed to various techniques such as cropping or transformation which leads to the message being unreadable, it would be classified as having low robustness. In regards to covert channels, good robust- ness can also be determined based on their survivability, e.g. their ability to persist in case of firewalls for example.

13 Undetectability is how good a steganographic techniques is at avoiding detection. A poorly implemented image steganography technique with too much payload may show visual distortions that are visually perceptible. This will of course lead to the secret communica- tion being detected. Looking at undetectability for image steganography, it is important that images do not have any visual artefacts that can give away the presence of a secret message. Using basic LSB steganography for example without embedding to much information will be sufficient to avoid visual inspection. However, when using statistical analysis of an image that has embedded data using LSB, techniques such as RS-analysis [35] to name one of many steganalysis techniques, can reliably detect embedded messages. Steganographic bandwidth or hiding capacity is how much information can be embedded into a carrier. Steganographic bandwidth is a term more commonly used when talking about network steganography [10, p. 48]. The hiding capacity of different techniques can differ depending on where the data is hidden. As mention in Section 3.1.1, using LSB steganogra- phy for example will allow you to hide 3 bits per pixel without this being visually perceptible. Therefor, having a larger resolution of an image will allow you to embed more data than using smaller images. When it comes to steganographic bandwidth, there are more factors to take into consideration. If you choose to create a covert channel by embedding secret messages in the Type of Service field of the IP header instead of establishing your own overt channel in which you embed your covert channel, then the steganographic bandwidth will most likely be smaller. The bandwidth also depends on the packet rate of a network for example. More packets per time unit sen will equal more information being able to be transmitted.

3.3 Steganalysis Steganalysis is the art and science of detecting the use of steganography given a digital medium [36]. Just like steganography, there are many different techniques and the goal is to detect a hidden message and preferably be able to extract it. It is also possible to try to dis- tort or destroy hidden information to make it unusable to the receiver. Steganalysis techniques are often referred to as attacks. There are different types of attacks. Visual attacks is as the name suggests, visually inspecting an image for example to try to identify visual artefacts. It can for example be to look at the color palette of a bitmap image and view the luminance. It is also possible to view images with a hex editor to find messages that might be appended after the EOF marker. This type of steganography is the most basic but also one of the easiest to detect. However, as steganography techniques have become better, visual attacks are more or less useless. Therefore are statistical attacks much more common. These types of attacks looks at the statistics of images for example to see if the result deviates from what is considered to be "normal" behavior. Jessica Fridrich et al [35] presented Regular-Singular (RS) analysis to detect the use of LSB steganography. They use different functions to categorize pixel groups(Regular, Singu- lar and Unusable). Depending on the resulting relation between the groups, the use of LSB steganography, as well as the length of the embedded message can be determined. It is also possible to detect hidden messages by analysing the DCT coefficients mentioned earlier [37]. Looking at a JPEG image without an embedded message, a histogram of the DCT coefficients usually follows a Gaussian distribution. Looking at Figure 3.5, you can see that there is a generally smooth curve and clearly 0 with the highest amount of occurrences.

14 Looking at a histogram with an embedded message at full capacity, the histogram may look something like in Figure 3.6. It is directly clear how examining the DCT coefficients can lead to detecting a hidden message. However, there are better and worse embedding algorithms that make this less or more useful. Machine learning is also being used to detect hidden messages or data. [38] uses machine learning to detect packed executables after converting a .exe file into a grey scale bitmap image. Although that this was used for executable files, it is possible to apply machine learning to virtually anything including all or most forms of steganography.

Figure 3.5: Histogram of a normal image. Recreated from [39]

15 Figure 3.6: Histogram with hidden message embedded at full capacity. Recreated from [39]

3.4 Malware Malicious software (malware) is nothing new. There are many different types of malware such as, cryptojacking, ransomware, spyware, worms and viruses for example. According to McAfee [40] ransomware, mobile malware and coin miner malware, to name a few, have all increase in the last year with an overall increase in malware attacks. Ransomware is malware that is used to force people to pay criminal actors. First, a machine is infected by phishing or malvertisement for example. Malvertisement is adver- tisement on a website for example that has embedded malware. Then the machine can have its files or hard drives encrypted, trying to force the owner of the machine to pay a ransom to have their files decrypted. The widely reported WannaCry malware was a form of ran- somware. It is classified as the top priority threat by law enforcement according to Europol [41]. They also report that ransomware is becoming more sophisticated and more targeted than before. Ransomware-as-a-Service (RaaS) on the Darkweb has made it possible for less skilled cybercriminals to use Ransomware as a tool to gain money [41]. Cryptojacking is when an an unauthorized user abuses services or devices from a third- party to mine cryptocurrencies. It can for example use CPU power and bandwidth. There is both web-based cryptomining where the criminal actor has scripts running on a victims browser and the second type where the criminal actor has to have specific malware and infect the victim to be able to abuse their CPU power for example. Spyware is malware that is installed on a victims machines and monitors different actions [42]. A victims web surfing habits as well as credentials such as passwords can be stolen. Their internet habits can be sent back to the criminals and used to produce targeted advertise- ment. Unlike viruses and worms, spyware does not generally aim to cause damage or spread to other systems [43].

16 3.5 Malware detection There are two broad categories of malware detection techniques: anomaly-based detection and signature-based detection [44]. Anomaly-based detection inspects a program and com- pares it to what it constitutes as "normal" behavior to determine if it is malicious or not. Anomaly-based detection is closely related to behavior-based detection [7].Specification- based detection is similar to anomaly-based detection. It is rule-based but unlike anomaly- based detection that relies on machine learning, a security expert manually defines rules. It also assumes that any policy violation is malicious [7]. Signature-based detection looks at characteristics of a program that is being inspected to see if those characteristics match ones that are known to be malicious. Having a large database of signatures requires good management to maintain a proper rule set to make sure that the system does not have an excessive amount of false alarms. Signature-based detection is offer good protection against older but active threats but lack when it comes to new malware or threats as well as malware that is hidden [7]. Behavior-based malware detection is able to detect malicious behavior during run-time, processes that are unidentified as well as recognize the type of malware. The allows behavior- based detection to detect unknown malware [7]. There are certain behaviors that it can look for: installing rootkits, creating or executing files, disabling security features or protocols and modifying auto-start for example. There are a few different ways that behavior can be determined. The system can monitor network traffic, system calls as well as resource changes. Behavior-based detection has to be adjusted to fit in the intended environment. Features have to be chosen such as network features (used port numbers, network usage, number of TCP packets with SYN flag on [7]), software features (event logs and system calls) and hardware features (battery monitoring, device information and access to IMEI of a smartphone). Entropy-based detection is also a technique used for malware detection. This calculates the entropy of certain fields and compared them to a threshold. If the calculated value exceeds the threshold, then the object will be flagged as suspicious [45, 46].

3.6 Advanced Persistent Threats Advanced Persistent Threats or APT are cyber attacks executed by sophisticated and well- resourced (often state funded) adversaries that are targeting high profile companies or gov- ernments [47]. In this paper, I adopt the definition given by NIST, which defines an APT as [48]: "An adversary that possesses sophisticated levels of expertise and significant resources which allow it to create opportunities to achieve its objectives by using multiple attack vec- tors (e.g., cyber, physical, and deception). These objectives typically include establishing and extending footholds within the information technology infrastructure of the targeted organi- zations for purposes of exfiltrating information, undermining or impeding critical aspects of a mission, program, or organization; or positioning itself to carry out these objectives in the future. The advanced persistent threat: (i) pursues its objectives repeatedly over an extended period of time; (ii) adapts to defenders’ efforts to resist it; and (iii) is determined to maintain the level of interaction needed to execute its objectives". This definition gives a good idea over what distinguishes APTs from more common threats. Unlike a normal smaller attack with financial incentives and demonstrating its abili- ties, APTs always have specific targets and clear goals. Since the adversaries that use APTs

17 are well-resourced and organized, they are able to perform long-term attacks/campaigns against their targets with several repeated attempts. They also include using stealth tech- niques to avoid any detection that their target may have. The table below shows a comparison between more traditional attacks and APTs. Recreated from [47].

Traditional Attacks APT Attacks Highly organized, sophisticated, Attacker Mostly single person determined and well-resourced group Unspecified, mostly individual Specific organizations, governmental Target systems institutions, commercial enterprises Financial benefits, Competitive advantages, Purpose demonstrating abilities strategic benefits Single-run, "smash and grab", Repeated attempts, stays low and slow, Approach short period adapts to resist defenses, long term

Image steganography is for example used in APT attacks. A recent report points to that the North Korean APT group named Lazarus, who is considered to be responsible for the WannaCry ransomware outbreak, seems to have hid payloads within BMP image files [49]. The company Malwarebytes also reported this after they identified a document on April 13, 2021 used to target South Korea [50]. The reporting suggests the attack starts with a Mi- crosoft Office document with text that lures the victims to activate macros in order to view the file’s content. This triggers a malicious payload. A pop-up message comes up and claims to be an old version of Office, but calls an executable HTA file that is compressed as zlib file within a PNG file. An HTA file is an HTML that can also contain VBScript or JavaScript code. When this is being decompressed, the PNG is converted to the BMP format, and when this happens, the executable HTA drops a loader for a Remote Access Tool(RAT). This RAT is then able to communicate with a command and control server, from where it can receive commands for example. This shows how a victim can be infected in a way where they have no clue that it has even happened. Using steganography and other techniques, it is able to avoid detection and run in the background for a long time before being detected.

3.7 Command and Control Command and Control, also known as C&C or C2, is the methodology of establishing a connection with an infected host which criminals use to allow them to control their malware and/or botnet as well as receive reports from them. In 2016, a ransomware under the name of TeslaCrypt got an executable from their C&C server that was embedded in a HTTP 404 error page in the HTML comment tags [7]. Events like this show that C&C communication can be done not only through standard encrypted channels, but from normal innocent looking traffic. Fakem RAT was another case where the C&C traffic was disguised to look like MSN and Yahoo! Messenger or HTTP conversations [51]. The connection to the command and control server is often established early in an attack so that configuration files can be downloaded for the malware or to inform the C2 server of a new infected machine[52, p. 120][52, p. 78]. C2 channels can have multiple purposes. They can be used to extract information, send commands or configuration files to infected hosts, provide remote access or receive reports

18 from the victim. C2 channels used in APTs can extract information for a long period of time leading to large data loss for victims. It is a very effective tool since you can hide data in protocols that aren’t blocked by the firewall. Unlike using digital media files, this allows you to have continuous data flow [8]. These types of goals affect the later stages of attacks where the aim is to gain/maintain access to an infected system. This is why there are commonly used in APTs. What is important when it comes to command and control is to make the communica- tion look normal. Downloading many images from an odd website might raise suspicion for example. Therefor using public sites such as Twitter can be used to hide the C2 traffic by em- bedding information into images attached to tweets [53]. Fetching tweets, that are typically small, is not ideal when it comes to large amount of data, such as an update for the attack. It is therefore better to use tweets to indicate where updates may be available, for example on file sharing sites. Since this will not often be accessed, it will most likely be considered as normal traffic.

19 4 Results

4.1 What are the most prevalent techniques used for stegomalware and how effective are they? There are two main categories discussed when it comes to stegomalware: Digital media steganography and network steganography. These will be discussed in separate subsections below including popular techniques within each category.

4.1.1 Digital media files The table below shows the sources used for this section.

Blog posts/ News articles Reports Articles [54, 55, 50, 49, 56,4, 57, 58, 59] [60, 52] [7,8]

When it comes to digital media files, images seem to be the most popular carrier for hiding malware [4]. There are however many different image formats, JPEG, PNG, GIF and BMP to name a few, with many different techniques for each. First, a look back into earlier malware’s that used images will be presented. In 2011, malware was used to leak information on control systems. This malware appended information that it wanted to exfiltrate to the end of images and then sent to a remote server [7]. The Zeus/Zbot malware used a similar technique for hiding configuration files for the malware [8]. Vawtrak is another malware that hid update files that it retrieved from their C&C server in the LSBs of favicons. Each favicon is approximately 4kB but is enough to carry the update file in its LSBs [60]. Furthermore, Stegoloader is a malware that was identified back in 2013 that used LSB steganography with BMP and PNG images for hiding encrypted URLs that the malware would use to download additional components [54]. A variant of the aforementioned Zbot malware, dubbed ZuesVM, also uses images to retrieve configuration files[55]. Like Zbot, the configuration files were appended to the end of the image to avoid detection. These type of techniques where configurations file and such are pulled from the attackers server, can also be classified as a covert channel. A recent reported threat using malware embedded in images is one where Lazarus APT conceals malicious code in a BMP image [50, 49](Also mentioned in Section 3.5). Although to the best of my knowledge, exactly how the malicious code is embedded into the image is not yet reported. The attack starts with a malicious macro in a Word document. A pop-up message is displayed and claims to be an old version of Office, but this actually gets a PNG image from the active document which contains a compressed zlib file within. The next step, the PNG file is convert into the BMP format by calling a function. Converting the PNG to BMP, automatically decompresses the malicious zlib file that is embedded as well. The most common image steganography technique used by attackers is to modify the least significant bit of for example the RGB channels in an image [56](This type of technique was described in Section 3.1.1). Digital images were also shown to be the most common carrier for stegomalware in the data analysed by CUING where 40.6% of the malware was hidden in images [4].

20 There are many different carriers to choose from when it comes to digital media steganog- raphy. As mentioned earlier, using audio, video, PDF and other media files are also able to be used as carriers [18]. Word documents, Excel files and zip files have also been shown to be abused as the entry vector for attacks [49, 52, 57]. Word documents and Excel files contain malicious macros in these cases. They are sent to the victim that lures them into opening the files and enabling macros. This is when the attack starts. Zip files have also been sent via phishing campaigns where for example an embedded PowerShell script in contained that starts the attack [58]. In this example, other scripts are also downloaded and disables Windows Defender. PDF Files are also used in phishing campaigns, with a reported increase of 1,160% detected malicious PDF files [59]. PDF files can contain malicious JavaScript code that can allows attackers have execution control on the infect machine [61]. The more recent trends amount malicious PDF documents seem to be traffic redirection. Doing this allows attackers to use a technique called drive-by-download. This technique forces victims to download malware without being aware that it is happening [62].

4.1.2 Covert channels The table below shows the sources used for this section.

Blog posts/ News articles Reports Articles [63, 64, 12] [65, 66, 67, 50] [68, 69]

Covert channels have been discussed earlier in this thesis. They are most often used in later stages of an attack for extracting information from the infected machine, sending commands to the malware or to give remote control of the infected machine as backdoors for example [63]. As discussed in Section 3.1.2 there are different methods for creating different types of covert channels, such as timing channels, sequence channels or utilizing different fields in headers(storage channels) to embed information. Using network steganog- raphy/covert channels successfully there are two conditions that must be met. First off, the carrier chosen should be popular to make sure that the presence of the traffic is not seen as an anomaly. Second, the modifications made to the packets to carry the payload should not be noticeable by the system [63]. Lazarus APT, used the ThreatNeedle malware to establish a backdoor after infecting vic- tims with a COVID-19 related spear-phishing email [68]. With this backdoor, the attackers could perform several tasks such as system profiling, update backdoor configuration and ex- ecute received commands to name a few. Using SSH tunneling and PuTTy PSCP(PuTTy Secure Copy client) they were able to have remote access to infected machines. For upload- ing stolen data to their C2 server, HTTP POST requests were used. According to FireEye, the Sunburst malware that was used as the backdoor for the So- larWinds attack, also communicated with the HTTP protocol to their C2 server [65]. The malware used masks their traffic by making it look like the Orion Improvement Program (OIP) protocol as to mimic normal SolarWinds API communication [65]. The DNS protocol is also a popular candidate when it comes to covert channels. It allows for large data transferring since it contains a large amount of packets [63]. The DNS protocol is also something that is used everywhere across the Internet which doesn’t make it stick out

21 of the ordinary. If an administrator applies rules that are too strict towards the DNS protocol, this may also lead to considerable issues [64]. The W32.Morto bug abused a vulnerability in Remote Desktop Protocol(RDP) and once the victim was infected, it made use of the DNS protocol when communicating to its C2 [64]. Another malware that made use of the DNS protocol is the Feederbot malware [66]. As with W32.Morto, it used the DNS protocol for its communication with the C2 server. Furthermore the PlugX malware is a Remote Access Tool(RAT) that was most common malware for targeted attacks during 2014 [66]. As with the aforementioned malware’s, this also used the DNS protocol for its communication with the C2 server. However, the core of this malware supported the use of other protocols as the C2 carrier, which allowed them to use protocols such as TCP, UDP and HTTP for communicating with their C2 server. A recent blog post from TrustWave, shows how attackers utilize the Internet Control Message Protocol(ICMP) to create a tunnel for communicating with the infected machine [67]. This malware was named Pingback and like many other uses DLL hijacking. According to TrustWave, the initial entry vector is still being investigated. The ICMP protocol is mainly used for control and diagnostic purposes. It can also be used for malicious purposes which is why there is a debate whether ICMP should be disabled or not. In this attack, one DLL uses ICMP for its main communication. The ICMP tunnel mainly uses two types of messages, namely echo(code 8) and echo reply(code 8). According to TrustWave, the attacker always sent 788 bytes in the ICMP data field.

The malware starts a sniffer on every IP looking for ICMP packets. It specifically looks for an ICMP echo packet and one that contains ICMP sequence number 1234, 1235 or 1236 as depicted above. These sequence represent three different messages. 1234 states that the packets contains a command or data. The other two are used for pure ICMP packet com- munication. 1235 states that data has been received at the attackers end and 1236 states that new data has been received by the malware. This malware also supports different commands. Shell tells the malware to execute a shell command. Exec is to execute a command on the infected machine. Download, which contains three different modes. The first mode tells me infected machine to connect back to the attacker(this allows attackers to bypass firewalls that block incoming TCP connections). The second mode opens a socket on a specific port to which the attacker connects. The third and final mode is ICMP-based, however this is very slow and with the current implementation of the malware is not too reliable when it comes to flow control. The Upload command uploads data and also has three modes similar to Down- load. Pingback as described, uses the ICMP protocol for the initiating the aforementioned commands but then uses TCP for increased performance and reliability. The attack from Lazarus APT where they used a BMP image to store malware uses HTTP requests, which are encrypted using a custom algorithm, for its communication with the C2 server, allowing it to receive commands such as sending exfiltrated data to the C2 [50]. The data that is to be sent back to the C2 is encoded and encrypted and sent to the C2 as test.gif using the HTTP POST request. Covert channels have also been discussed by using the IPv6 protocol by Mazurczyk et al. [12]. They discuss the theoretical possibilities, based on [70], of 6 different methods targeting the header of the IPv6 header followed by an evaluation based on captured traffic.

22 Figure 4.7: Representation of ICMP packet sent by the attacker. Recreated from [67].

The methods are aimed towards 6 different fields and related mechanisms that are used. The first is the use of (1) Traffic Class. As shown in figure 4.8, it is 8 bits long and consists of two parts, Differentiated Services Code Point(DSCP) which are the first 6 bits and Explicit Congestion Notification(ECN) which are the last 2 bits and it specifies the expected service from the network. Secret data can be embedded here which gives it a bandwidth of 8 bits per packet. However, intermediate nodes may alter this field which disrupts the covert channel. The second is (2) Flow Label which is 20 bits long and is used by network nodes to route traffic the most fitting path. Intermediate nodes should not alter this label. Using the Flow Label can allow you to transfer 20 bits per packet. The (3) Payload Length defines the size of the data field of the datagram which is a maximum of 65,536 bytes [70]. By manipulating this field, you can hide information as to append arbitrary data to the payload. To avoid the packet from being dropped by intermediate nodes, the checksum has to be updated. The bandwidth in this covert channel depends on the amount of embedded data, but if cannot exceed the maximum size allowed for the datagram. Furthermore, one should remove the hidden information before it is delivered to the receiver [12]. The (4) Next Header field states the next header of the payload of the packet. These can be values such as 6 for TCP, 58 for ICMPv6, 1 for ICMP and 17 for UDP. Altering this field so that it points to a "fabricated"

23 Figure 4.8: IPv6 protocol header. Recreated from [70] header containing data. The bandwidth of this technique depends on the size of the fictitious header embedded. Like the aforementioned technique, the hidden information should be removed before being delivered to its final destination. (5) Hop Limit, as the name suggests, defines the maximum amount of hops a packet may perform. It is 8 bits long and can therefore have 256 different values. By altering this value, either by increasing or decreasing it for consecutive packets, one can hide information. As long as it is not interrupted, this gives a bandwidth of 1 bit per packet. Lastly, (6) Source Address contains the network address from the source. By replacing some bits it can reach a maximum bandwidth of 128 bits per packet. Traffic was captured for four days between Chicago and Seattle to investigate the use of IPv6 in a real-world scenario and found that IPv6 was about 4% of the entire traffic [12]. This was done to investigate what values in the different fields were most common. Starting off with Traffic Class, DSCP was observed to only have three possible values. If DSCP is manipulated to contain anything else but one of those three values, it could be considered an anomaly and thus leading to the covert channel being detected. ECN was observed to only have 0 as the value in 99.99% of the packets captured which eliminates this from con- sideration. These findings show that altering this field allows someone to have 3 different values instead of the 28 possibilities giving it a bandwidth of 2 bits per packet instead of the 8 proposed in theory. Due to the implementation of the Flow Label field, having it set to 0 seems to be seen in 96% of packets captured. Since the behavior is hard to predict, it is hard to give a precise bandwidth estimate. The maximum packet length of an IPv6 datagram is actually 56,536 bytes. However, the maximum size in the observed packets were 1460 bytes, which is typical when looking at the Maximum Transmission Unit(MTU) supported by IEEE 802.3/Ethernet L2 [12]. Assuming an MTU of 1500 bytes, the maximum amount of hidden data that is able to be sent is 1416 bytes due to the fact the 24 bytes is removed for Ethernet, 40 bytes for IPv6 and 20 bytes for TCP. There findings limit the bandwidth using the Payload Length field since the maximum possible size would present an anomaly leading to possible

24 detection.The maximum theoretical possible values able to be put in the Next Header field is 28 since it is 8 bits long. However according to the data captured, the Next Header field indi- cated TCP 99.15% , UDP 0.55% and ICMP 0.3% of the time. Due to this very few packets can be manipulated leading to a very limited bandwidth. Like the previous field, Hop Limit can also have 28 possible values. The default value for this field is 64, however ranges 51-54 and 242-245 were the most common observed. The neighbour discovery protocol used by IPv6 manages automatic configuration and address resolution for networks to name a couple operations [12]. By paying attention to the Hop Limit value, one can modulate this field be- tween adjacent packets and this can be used as a covert channel giving a bandwidth of 1 bit per packet as discussed earlier. Lastly the Source Address is very unreliable since altering this field may lead to disrupting the network connection. Systems that protect against spoofing can also easily detect this alteration which will destroy this covert channel [12]. The techniques discussed above are examples of storage channels and tunneling. Apart from the reported attacks from Lazarus APT which also used HTTP POST to upload data to their C2 server [50]. Using storage channels allows for much diversity since there are many redundant fields to be found in common protocols used daily. The use of tunneling may allow an attacker to bypass firewall rules like in the case on [67] where it allowed them to set up a TCP connection from inside which bypasses firewalls that block incoming TCP connections. Using SSH tunneling which was reported in [68], allows attackers to have secure encrypted communications with the infected host. Further more, according to Enisa [69], 45% of malware sent by email was found in .docx files and 67% of malware was delivered via HTTPS.

4.1.3 How effective are these techniques? In Section 3.7, the theory behind what is considered an effective technique was discussed. It is those three different aspects that will be used when considering the effectiveness of a technique. The desired result is to have a technique that has a very large capacity, very high undetectability as well as very high robustness to make sure that data will not be destroyed if alterations to images or packets are made for example. There is however always a trade off between the different aspects. In general, effectiveness can be described to the extent that the stated objectives are met [71]. The techniques discussed above are mostly techniques discovered in the wild. These techniques must be seen as effective to an extent since then have been part of successful attacks. However, lets take a look at the three conditions: capacity/bandwidth, undetectability and robustness for a further evaluation.

4.1.4 Effectiveness of digital media steganography Starting off with digital media steganography, most cases use PNG or BMP images. PNG and BMP images are lossless image formats compared to JPEG and GIF that are lossy formats. When it comes to the embedding technique used in the attacks mentioned in Section 4.2.1, they were most often using LSB or appending files to the end of images [7,8, 54, 55]. LSB is the most simple and most used embedding technique, as well as this technique being used with PNG and BMP images. Using LSB on the color plane of images, allows for at least 3 bits of secret data to be embedded per pixel. Depending on the resolution or dimensions

25 of the images used in the attacks, one can calculate the amount of data embedded in the images. Appending secret data to the end of an image was also a technique used. Although this technique does not necessarily have a capacity limit linked to the size of the image, appending files to the end of an image directly increases the size of the file. This can make such a technique easier to detect. The Vawtrak malware used LSB embedding in favicons in order to download malicious payloads [60]. This malware always used 32x32 sized true- color favicons [72]. The encrypted message that is extracted from the favicons is always 288 bytes [72]. Embedding information into images is simple and there are many open source tools online that achieve this. In terms of capacity, it may be sufficient depending on what the main goal is. If the goal is to send smaller configuration files, URLs that point to where to download information or simple commands, then the capacity is sufficient. However, if the goal is to exfiltrate larger amounts of data, it would not be a suitable choice due to the fact that many images would have to be sent to the C2 server. Images are everywhere on the Internet. This allows them to be suitable carriers sue to the fact that uploading and downloading images may not be seen as an anomaly on a "normal" network. Since attackers often use legitimate file sharing sites, the domains to which images may be uploaded or downloaded will not be flagged as irregular [73]. Appending files to the end of an image will not create any visual artefacts that a human can see, however they are very easy to detect [37]. Performing steganalysis to see if an image may be malicious can be time consuming and if a network does not have active policies for inspecting images, exfiltrating data or sending commands with images will go undetected. Using LSB steganography is simple. However, it is not very robust. The data hidden can easily be destroyed by simple attacks or the hidden data can be lost by image manipulation [74]. Since the formats used were mostly BMP or PNG, if they are converted to an other format for example, the data will also be lost. Simple steganalysis also seemed to have positive results regarding favicons [72]. There finding show that these types of technique have decent capacity, good undetectabil- ity but rather low robustness. The use of malicious documents or files are common entry vectors in attacks. Word documents and Excel files can have malicious macros that start a change of sequences to initialize an attack. These techniques do not have the goal of carrying larger amounts of information. The strength of these documents lie in the use of them among organizations. Excel and PDF files are often used in business communications which attackers use to their advantage. When it comes to the undetectability of these phishing campaigns, the goal is to trick victims into opening the documents. Using Word documents, Excel files, or PDF files have been used successfully in sev- eral attacks [49, 52, 57] which speaks to the effectiveness of such techniques. The capacity although not large, is sufficient for these techniques. Since these types of files are com- mon among organizations, they are often trusted. This leads to them being downloaded and opened by victims giving them good undetectability. Unless an organization uses tools such as Content Threat Removal [75](discussed further in Section 4.4), aimed to remove parts in documents that can be used for malicious purposes , it seems that using the aforementioned file types give good robustness assuming they are delivered and opened as in their original format.

26 The table below shows a summary of the effectiveness of the various techniques.

LSB Steganography File appended to image PDF, Word, Excel Undetectability Good Good Good Robustness Low Low Good Capacity Good Good Low

4.1.5 Effectiveness of covert channels Covert channels, if implemented correctly can go unnoticed for a long time, as seen in the SolarWinds attack [65, 76]. This is the optimal choice when it comes to exfiltrating large amounts of information. The key aspect to consider with covert channels, so that they are to remain covert, is to have traffic that does not appear as an anomaly. This goes for the type of protocol used and the amount of information injected into fields for example. When it comes to the bandwidth of a covert channel, there are a few parameters that define the steganographic bandwidth. In general, by injecting data into a field within a header you get a bandwidth proportional to the packet rate, i.e. x bits per packet with a flow of y packets per second [12]. In some cases it is easy to know the amount of information sent. For example with the Pingback malware where the attacker always sent 788 bytes of data in the ICMP data field [67]. Pingback also reportedly used the TCP protocol for more reliable data transfer and assuming that the MTU was the standard 1500 bytes, it would have decent bandwidth depending on the packet rate. The IPv6 header field discussed earlier showed many possibilities in terms of covert channels. Most fields showed that the possible bandwidth was low since they could be altered less than in theory to make sure that they did not appear as an anomaly. The only field that seemed to give a reasonable bandwidth was the payload length field with 1416 bytes per packet. The attacks that use the DNS protocol are hard to determine a bandwidth on. The W32.Morto bug used the DNS TXT record to fetch encrypted text that contained a signature of a file and an IP address that points to where to fetch the next download [64]. Even though the theoretical maximum length of the TXT record is 65,536 bytes, most DNS servers limit this. This forces the attackers to use smaller sizes so that they do not appear out of the ordinary. Assuming they fetch an IPv4 address which is 32 bits, it all depends on the size of the signature do determine the actual bandwidth. Some attacks also used the HTTP protocol for communication with their C2 server. HTTP POST was used to upload data to their C2 server [50, 68]. In the case of [50] they uploaded a .GIF file with embedded data. Although there is no specific limit to how much data can be sent by HTTP POST, one can assume that the bandwidth used in those cases were around the standard 1500 bytes per packet(standard MTU). Exactly what is determined as "good bandwidth" depends on how much information needs to be sent. ENISA states that 67% of malware is delivered via HTTPS. Assuming that they make use of the standard MTU, they would be able to send 1500 bytes per packet. Since this is the standard for most internet users, having a bandwidth of 1500 bytes per packet would be considered good. Except for Pingback where they used the ICMP data field to send data, an average 1500 bytes per packet can be assumed. Packet rates observed in [12] between Chicago and Seattle showed that HTTP and HTTPS traffic averaged around 40 packets/minute and other traffic targeting sites like YouTube averaged 120 packets/minute.

27 Assuming 40 packets/minute, and 1500 bytes per packet, this gives you 60,000 bytes/minute or 1000 bytes/second. Using the largest covert channel discussed in [12](payload length) one would get 944 bytes/second with the aforementioned rates. Using the same rate for the data sent in the ICMP data field(788 bytes), you would get a bandwidth of 31,520 bytes/minute or approximately 525 bytes/second. These values show that a high steganographic bandwidth can be achieved using these methods. The protocols used in the aforementioned attacks are very common on the Internet. This allows them to go unnoticed for long periods of time. The Pingback malware established a TCP connection from the inside which also allowed them to avoid detection. Nevertheless, due to the use of common protocols and their proved ability to stay under the radar, these techniques would be seen to have good undetectability. Robustness of a channel is the ability to stably send covert information [77]. Tests were performed to determine if the IPv6 covert channels using the Traffic Class, Flow Label, Pay- load Length and Hop Limit fields in the IPv6 header could be plausible to successfully ex- filtrate data. Information was injected either in burst where information was hidden in con- secutive IPv6 datagrams or interleaved where information was injected at random times so that traffic would alter between normal and altered packets. The tests performed showed an initial result that information injected in the Traffic Class, Flow Label and Payload Length with interleaved injection showed the best ability to successfully transmit different amounts of data [12]. This shows that those covert channels would have the best robustness of the dif- ferent covert channels discussed in regards to IPv6 header. When it comes to the robustness of the other techniques, there is not much to go on. They used commonly used protocols, managing to blend in with normal traffic, either by exhibiting normal behavior or by hiding behind other traffic. Due to the fact that there is insufficient information about the robustness, it is not possible to quantify. The table below shows a summary of the efficacy of the aforementioned techniques ex- cluding the IPv6 covert channels.

HTTP DNS ICMP Undetectability Good Good Good Robustness N/A N/A N/A Bandwidth Good Good Good

The table below shows values based on tests from [12] for IPv6 covert channels.

IPv6 Header Field Traffic Class Flow Label Payload Length Undetectability Decent Good Good Robustness Decent Decent Decent Bandwidth Low Low Good

28 4.2 The main challenges that arise when trying to detect stegomalware This section will discuss the main challenges that arise when trying to detect stegomalware hidden within covert channels and digital media files as well as what different types of detec- tion techniques there is.

4.2.1 Challenges when detecting hidden information in digital media Images as stated earlier, are everywhere on the Internet. This makes image steganography so popular. Looking at the human factor, they seem to be one of the largest threats when it comes to malware being able to penetrate a network since many attacks start off with phishing emails [49, 52, 58, 68]. Humans can easily be tricked by phishing emails that initiate a series of events that download malicious software. Once the malware has been able to get access within the network, it becomes much more difficult to detect [78]. The challenges with humans are teaching them what to look out for in phishing emails for example, how to generally stay more safe on the Internet or to make sure that they follow implemented security policies or guidelines. There are tools available for detecting content within images, such as StegExpose or StegDetect, they are however able to miss content that is embedded which will allow ma- licious payloads to bypass security. Security measures always have to get it right while criminals only have to be lucky once to get in to a network. For example, Lazarus APT was able to avoid detection by embedding a compressed zlib within a PNG [50]. A challenge with detecting information hidden in digital media files or documents is having a security mechanism that can do it reliably. As malware evolves giving new signatures for example or information is embedded in a certain way that evades detection, already implemented secu- rity measures quickly become outdated. Furthermore, it is a challenge to know where to find hidden information as well as how to detect it. Since some attackers use legitimate image sharing websites it can be difficult to notice anomalies in network events, such as uploading or downloading images and files [73]. The Stegoloader malware for instance downloaded a PNG used in its attack from a legitimate website [56].

4.2.2 Challenges when detecting covert channels Theoretically, covert channels can be anywhere and everywhere. This is what makes them so hard to detect. Knowing what to look for and where to look for it is key when it comes to detection. Many of the most common protocols on the Internet are used for covert channels, for example HTTP/HTTPS, ICMP, TCP/IP and DNS to name a few. Since there is such a large variety of techniques, this adds to the difficulty of detecting them [79]. According to Enisa [69], 67% of malware is delivered via the HTTPS protocol and as you know HTTPS is encrypted traffic which stops the ability to inspect the data sent in each packet. This hinders the ability of Deep Packet Inspection(DPI) that allows you to inspect what packets contain. Furthermore, to be sure that DPI is effective you must keep policies up to date to make sure malicious traffic does not slip between the cracks. This goes for all type of traffic filters on a network. For example stopping incoming TCP connections with filter rules may be bypassed if malware has already managed to establish a foothold within the network and manages to initiate a TCP connection from the inside [67]. Dealing with

29 malicious use of the DNS protocol is another challenge. Applying too strict filters may cause considerable issues with the network [64]. Another challenge is to implement security measures that will be able to tackle several types of covert channels since only focusing on a single or few covert channels will leave the system or network vulnerable.

4.2.3 Stegomalware detection techniques Machine Learning (ML) has shown promising results in detecting stegomalware. Cohen et al. implemented ML for detecting malicious JPEG images, showing promising results after analysing 156,818 JPEG images [80]. A machine learning approach commonly proposed is two-class supervised learning [81]. This approach is useful for detecting known stegano- graphic technique variants. This type of algorithm trains on data sets that contain objects with steganography present and absent. Another approach that is effective against this as well is supervised learning [81]. This falls short when completely new techniques are used. Using unsupervised anomaly-based learning techniques have also been proposed [82, 83]. For these types of approaches, the algorithm only learns on data sets where steganography is absent for building a "normal" profile. This will then be used and compared to anomalies which then can be used to detect completely new steganographic techniques. Using anomaly based de- tection with machine learning has also been proposed to detect anomalies in network traffic, which then hopefully will detect the presence of covert channels [84, 85]. Apart from ML based techniques, behavior-based detection is effective at detecting new malware as well as variants of malware. It is effective against obfuscated malware however produces high false positives [86]. Model-checking based as well as Cloud-based malware detection has also showed positive results for detecting obfuscated malware [86]. Image steganalysis is also a great tool for detecting the presence of content embedded in images. There are many different techniques proposed for detecting embedded content. The most effective and robust techniques are statistical analysis techniques [37]. A review of techniques within different categories of image steganalysis is made in [37]. Entropy based detection can also be used to detect malware within images. One can calculate "word entropy" for attribute values in images, EXIF header values as well as other attributes. This will present a base and a threshold value. If the word entropy calculated for a specific field, then this can be flagged as suspicious [46]. Altaher et al. [45] proposed the use of entropy for anomaly detection in networks with the experimental results showing this method to be efficient. No detection method detects all types of stegomalware. A combination of different meth- ods would be the best for having more reliable detection results. Machine learning approaches have shown great promise. The need for a combination of techniques and methods for reli- able detection is therefor also a challenge since the correct security measures need to be used at the right places.

4.3 Defending against stegomalware Detecting and defending against stegomalware goes hand in hand. Defending against stego- malware or hidden malware in general, requires combining best practices when it comes to human behavior, tools and techniques. First off, as stated within the IT-security industry is

30 that the people in a company for example is the weakest link when it comes to cybersecurity. Kapersky released a blog post in 2017 stating that the human factor played a large role in making systems vulnerable [87]. Human error seems to be the fault from this reporting [49], where the victim is tricked into opening and viewing a Microsoft Office document. This also seems to be the case in this report [58] where victims are targeted by phishing emails. These types of reporting makes it clear that educating employees in your company is important. Training employees, informing them of potential threats and making sure they follow security policies in place are key features into defending against cyberthreats [87, 88]. Making sure that macros for Microsoft Office documents are disabled is also a good defence to prevent attacks such as those reported in [49, 57]. Content Threat Removal (CTR) [75, 89, 90] is a tool that can be used to remove hidden threats in digital content, such as stegomalware. This tool removes things like macros and scripts that are used to spread malware as mentioned in the aforementioned reports. CTR cre- ates a replica of a document that is received but strips anything that can be used for malicious purposes . It takes a similar "zero trust" approach that has been used in networks. Furthermore, keeping anti-virus systems up-to-date as well as other software is crucial to avoid zero-day attacks and other vulnerabilities. Developing and keeping you Security Information and Event Management(SIEM) rules updated is also important to make sure that they are able to detect recently reported malware. Keeping a close eye on system calls or events may also give away information about an ongoing attack [91]. Spotting anomalies with ones IDS is a good way to find out if any covert channels exist in ones network. By updating/improving the signatures and rules used by ones IDS it can for example help to mitigate the threats of an IPv6 covert channel [12]. Traffic normalization is also a very common technique to tackle covert channels. They are deployed at security boundaries and normalize timing behavior or protocol headers for example. They are shown to be very effective against known covert channels [10, p. 209- 210]. Stateful normalizers keep track of information about previous packets that can be used to detect covert channels. Besides normalizing traffic, it is possible to limit the channel ca- pacity to an acceptable rate where the amount of data sent is so small that they are practically ineffective. A channel that has such a low bandwidth that the information cannot be leaked before it is outdated, the channel will be tolerable [92]. In closed networks, blocking pro- tocols that can be used for covert channels should be blocked if possible. Having a secure network design, where only hosts that are on the same security level, should be implemented to avoid leaks from high-security systems to low [10, p. 213].

4.4 The stegomalware landscape today To understand the current landscape of stegomalware, a review was conducted on recent cases and reports by different actors. Looking at CUING [4], it shows us that the events where stegomalware has been detected is increasing. They also state that the amount of malware that has the capability to hide information is heavily underestimated. This makes it hard to get an accurate calculation of the amount that is out in the wild. The SolarWinds attack where a Russian APT actor used several types of malware to attack the SolarWinds supply chain by inserting a backdoor into their product. The hidden back door(named Sunburst) allowed the attackers to systems that were running the product

31 [76, 93]. After this they are able to communicate with their C2 server. Another recently reported attack was by the APT group Lazarus [49] where they hid payloads in BMP image files. As described in Section 3.5, the attack reportedly started after a victim enables macros in order to view a Microsoft Office Document. Lazarus seems also be have been actors in another attack targeting two South Korean companies [94]. The attack abuses a software called WIZVERA VeraPort, which users must download to be able to visit certain domains. Reportedly, Lazarus managed to steal certifi- cates from the two South Korean companies which allowed them to change the software that was delivered by WIZVERA VeraPort. If a victim visits a malicious website and downloads the altered software, a connection will be established between the victim and the attackers C2 server as well as a RAT being deployed on the victims machine. The News also reported the use of Excel 4.0 macros to distribute malware [57]. This was reported after Reversing Labs [95] found 160,000 Excel documents that used the Excel 4.0 macros of which 90% where classified by them to be suspicious or malicious. [95] also reports that the actors behind the malware QuakBot often distribute their payload via Excel documents. A seemingly new malware has also been reported by The Hacker News [58]. This attack begins with a phishing email containing a zip file that has an embedded PowerShell script. The script downloads another executable which also drops a second executable. The second executable downloads two more executables named def.exe and putty.exe. The def.exe is a batch script that disables Windows Defender while putty.exe contains the payload that will connect to the C2 server which will allow the attackers to further exploit the machine. The cyber security firm Red Canary released a report based on 20000 confirmed threats [52]. They presented the top techniques used in these threats as well as the top threats from 2020 that they observed. Among the top 10 techniques used, "Obfuscated Files or Informa- tion" was placed seventh with it being detecting in 7% of the threats which is about 1400 of the 20000 threats [52, p. 10]. In their top ten list of threats, there were four different threats that hid information in innocuous files or used C2 channels to leak information.The number one threat was TA551, also called Shathak, that uses large-scale phishing campaigns. They state that 15.5% of their customers were affected by this threat [52, p. 78]. The TA551 threats sends a zip attachment that contains a document. If the victim opens the document and have macros enabled, it will connect to the attackers C2 server and begin the next step of the attack. IcedID was the number four threat with 7.8% of Red Canary’s customers being affected [52, p. 78]. This banking trojan is the primary payload of the aforementioned threat and establishes a C2 channel after the installer Dynamic Link Library (DLL) is executed and downloads configuration files. was the seventh top threat presented by Red Canary with 5.8% of their customers being affected [52, p. 78]. This threat is also a banking trojan that is distributed via email. The email contains a malicious Excel document that leverages the Excel 4.0 macro (XML). These types of macros offer functionality similar to Visual Basic for Applications (VBA) macros. This threat was initially discovered back in 2014 where it delivered malicious Word documents hiding VBA macros. The final threat presented that used steganographic techniques was Gamarue. This re- portedly affected 5% of their customers. This threat is a worm that is primarily spread via USB drives. This threat had a C2 infrastructure but this was disrupted back in 2017 according

32 to Red Canary [52, p. 115]. The majority of Gamarue that Red Canary detected started with a victim clicking on a malicious LNK file that has been disguised as another legitimate files on the USB. Gamarue has been used to spread other malware, perform click fraud or to steal information. Another threat that was not presented as a top ten in their report was dubbed Yellow Cock- atoo [52, p. 120]. This threat was also reported by Morphisec, however naming it Jupyter Infostealer [96]. This threat reportedly starts with a downloaded zip files which contains an installer and an executable. They are usually disguised to impersonate legitimate software. After this a C2 channel is established so that they are able to extract data as well as have remote access to the victim [52, p. 120]. Back in July 2019, a blog post on Security Boulevard from Spanning Cloud Apps even reported that they classified stegomalware(or stegware) as the malware of the month [97]. These findings as well as the creation of organizations such as CUING and SIMARGL, show that stegomalware is becoming more common and require action to be taken to tackle this threat. The table below shows a summary of the aforementioned attacks in the same order as presented above.

Actor Threat Steganographic Technique SolarWinds Supply-chain - APT HTTP mimicking API communcation "Sunburst" backdoor Information embedded in BMP & Lazarus APT Unclear Malicious macro in Word document Supply-chain against Embedded malicious code in software and Lazarus APT South Korean companies then established C2 connection Distribute ZLoader and N/A Malicious macro in Excel document Quakbot malware Malicious PowerShell script embedded Phishing emails - N/A in LNK file. Proceeds to establish "Saint Bot" C2 connection. Malicious macro in Word document. TA551 / Shathak Phishing emails Proceeds to establish connection with C2 server. Often result after initial access from TA551 / Shathak Banking trojan TA551 (above). Executes malicious DLL and establishes connection with C2 server. TA505 / Banking trojan Distributed via malicious Excel macros INDRIK SPIDER Distributed via USB. Malicious LNK N/A Worm file disguised as other legitimate file.

33 5 Analysis

The research conducted to answer the research questions required finding both cybersecu- rity reports, peer-reviewed articles and reported attacks. Using all different kinds of sources allowed for a broader picture than one would get from merely peer-reviewed articles for ex- ample. Even though all resources are not peer-reviewed, when it came to finding information of techniques used by attackers for example, information found from less "trusted" sites were controlled by comparing reports that cover the same attack, to make sure that no misinforma- tion was presented. No large issues arose when answering the research questions except the fact that it was not possible to present a clear picture with only using peer-reviewed articles due to the fact that many aspects were not discussed in those sources. The sources used for this thesis aimed to be as close to present time as possible so that it would reflect the current state of stegomalware. Older scientific articles were however also included as long as their core aspects of stegomalware were still deemed relevant today. When researching attacks, the focus was using attacks that were reported in 2020 or 2021 due to the aforementioned reason. Older attacks were presented due to the techniques used which were also used in more recent attacks. The same reasoning went for presenting detection techniques.

5.1 What are the most prevalent techniques used for stegomalware and how effective are they? The following subsections will analyse the results presented in the previous chapter. This will follow the same structure as the results, starting off with the techniques within both categories and finally analysing the efficacy of the mentioned techniques.

5.1.1 Digital media files Giving an exact number for the use of each technique is hard. What is presented in the results are techniques used during documented attacks and many of them have similar characteristics when it comes to what techniques they used. Images, PDFs, Word documents and Excel files are what seem to be the most popular carriers with phishing emails being the most popular entry vector, with ENISA reporting that 45% of malware delivered with email was found in .docx files. Furthermore, PDF files had an increase use in phishing emails of 1,160% [59]. Since the concept of using steganography is to stay undetected, there is most likely ongoing attacks that may use the techniques mentioned [4]. However, due to the aforementioned issue, it is not possible to determine this until the attacks are detected, reported and analysed. Other malware’s have used similar techniques to the ones presented in the results, however every specific attack was not mentioned due to the fact that similar techniques were used and was believed to not give a more clear picture. Furthermore, it can be argued that LSB embedding or appending files to the end of images is the most popular when it comes to using images, which might be since these are the easiest techniques to implement as well as the attackers target having insufficient security measures to tackle this risk. Furthermore, evidence shows that cybercriminals are abusing individuals via phishing emails to gain access to systems. An employees poor awareness of what can be malicious content can be argued to be a large risk for companies. According to Enisa [69], 45% of

34 malware is distributed via .docx files , however this can of course vary depending on what industry or data observed for their report. ENISAs findings as well as [59] finding that mali- cious PDF files have had a dramatic increase, shows clear tendencies when it comes to what seems to be working for attackers. With the information available and by comparing old and new attacks, techniques used during older attacks were also found to be used for new attacks which help to give complete and comprehensive picture. Other techniques were also found during the research. However techniques that were not found during the research in several reports or similar to another techniques were not presented in the results since they were not considered as prevalent due to the lack of reporting of that technique.

5.1.2 Covert Channels No information was found about IPv6 covert channels being used in documented attacks. The results presented for different covert channels possible with the use of the IPv6 header were only tested in [12]. However, since IPv6 will be becoming a larger part of the Internet in the future, the use of IPv6 covert channels are most likely to be become more "common". If the covert channels of the future will include any of the methods presented in the results however, is any ones best guess. With the reports of attacks using the HTTP protocol presented in the thesis, shows good arguments that the report from Enisa [69], with 67% of malware being delivered via HTTPS, is in the ballpark. The fact that the amount of reported attacks have used the HTTP protocol as well as the Enisa [69] also reports the wide use of the protocol, the evidence is quite clear that this the most popular protocol to use for malicious purposes. Looking at the second most popular protocol, it can be argued that it would be the DNS protocol due to its wide use across the Internet as well as its use in malware’s such as DNSChanger, W32.Morto, PlugX and Feederbot. Finding information to answer this question required a look into many different sources as mentioned earlier. One challenge that hindered the ability to give a completely satisfac- tory answer is that some reports do not specify exactly how a protocol is used while others give a very comprehensive understanding how the attackers used covert channels, for exam- ple the Pingback malware. This left some uncertainty when it came to reports of attackers using the DNS protocol for example. Nevertheless, a sufficient answer is believed to have been provided by presenting what protocols are most commonly used by attackers for covert communication. As stated above, no documented attack was found to use the IPv6 protocol but it was discussed due to the fact that IPv6 will become a larger part of the Internet in the future which might lead to this protocol being used by cybercriminals.

5.1.3 Effectiveness of digital media steganography As mentioned in Section 4.1.3 the effectiveness was based on the capacity, undetectability and robustness. All results presented in regards to the effectiveness of these techniques are based on the theoretical information found in either malware analysis or the theory behind specific techniques. Since it is not easy to find exactly what images were used, if it was only one kind of image or the images could differ, it was not possible to give a value of how much

35 information could be embedded in regards to capacity. In regards to images, it is possible that attackers might have embedded information is the second or third least significant bit as well leading to larger capacities. Even though the potential capacity in the file formats mentioned in the results may be high, it can be argued that large payloads are not the goal. Macros or VBA scripts are often the chosen payload used to initiate an attack. Undetectability is hard to quantify. The undetectability is only as strong as the security measures in place for detecting these types of files. For example, one can embed information at full capacity leading to large visual artefacts, but if nothing in a system is controlling these types of files, then it will not be detected regardless. Since larger attacks usually start with reconnaissance, a suitable technique will be chosen with regards to implemented security measures. However, since images are a very common occurrence on the Internet, as well as attackers sometimes use legitimate websites to upload images when exfiltrating data [73], this may indicate a lack of awareness. Evidence shows that the robustness of stego-images are not high [72, 37]. Basic cropping and other techniques can easily destroy embedded information in PNG or BMP images. Even if this is the case, it is evident that it does not deter attackers from using basic embedding techniques for data exfiltration for example. It can be argued that this is the case because of the aforementioned security issues that may be present in organizations. The results provided for this research question is not considered completely satisfactory. This is however due to the lack of information found for certain technique, for example the image size determines the amount of information able to be embedded to avoid detection. Moreover, to give a better result for the robustness of certain techniques, experiments would preferably have been conducted to evaluate this aspect of effectiveness.

5.1.4 Effectiveness of covert channels What makes it hard to determine an exact capacity of the different covert channels mentioned in the results, is to find out exactly how protocols were used as well as the packet rate. However, since this is a variable that will pretty much always change, most estimations of steganographic bandwidth just measure the payload per packet instead of how much data can be sent per time unit, due to its fluctuation. The estimated bandwidth presented in the results were based on the packet rate presented in [12] which provided a good estimate(40- 120 packets/minute). Channels with such a low capacity where the data being exfiltrated becomes outdated before it can be leaked would be considered tolerable [92]. It can be argued that the channels discussed would most likely not be considered tolerable, however this would depend on the amount of data that the attackers wanted to exfiltrate. The undetectability of covert channels are based on aspects such as if the protocol is popular i.e. is common among "normal" traffic. The covert channels discussed in the results are all based on very common protocols such as ICMP and DNS. However, it all depends on how the protocol is used. For instance, if a large amount of big ICMP packets all of a sudden are detected in a network, this will most likely be flagged as suspicious and therefore detection of the covert channel. So it can be argued that the attackers used the protocols to mimic normal behavior of the specific protocol. This goes for the size of the packets and frequency of occurrence for example. It is also key that if specific fields in a header are used, then these must also mimic normal behavior. The covert channels using specific fields in the IPv6 header were for example limited since certain values in specific header fields were

36 more common than others which lead to limiting the amount of possible modifications can be made. Robustness for covert channels, as mentioned earlier in the report, is how good the chan- nel is at persisting when faced with fire walls or other devices/security mechanisms that can affect a traffic stream. The IPv6 covert channels with different header fields, tested the ro- bustness of certain channels i.e. the ability for them to fully transmit different amounts of data [12]. As mentioned in the results, there is not much to go on when trying to quantify the robustness of the other channels. They are however very common protocols, that might be hard to limit too much due to the impact that this may have on the overall functionality of the rest of the traffic. Much like the previous part of the first research question, the results presented for the effectiveness of covert channels is not seen as completely satisfactory. Vague or insufficient information on the use of certain protocols made it difficult to evaluate the bandwidth of a certain technique for example, with the exception of the Pingback malware and IPv6 covert channels which had much information to research. Conducting experiments would make it able to test out the capacity, undetectability and robustness of different channels which may have yielded slightly different results.

5.2 What are the main challenges in detecting stegomalware hidden with these tech- niques? The three following sections will analyse the results presented for the third research question. The results presented for the two following sections (Sections 5.2.1 and 5.2.2) are based on challenges identified when conducting research for all research questions. They are based on the strengths of techniques used by attackers which make them hard to detect. There is most likely more specific challenges that are not presented in the results, however the aim was not to identify challenges with each specific techniques, but was aimed to give a general overview of challenges that cover covert channels and digital media steganography generally. The results are therefore considered satisfactory.

5.2.1 Challenges when detecting hidden information in digital media The challenges for detecting digital media steganography, in terms of its use for stegomal- ware, focused on the human factor as well as general technical aspects. Human factors can also be seen as the root cause in terms of poor security, for example insufficient filtering rules for a network. However, the human factor was only discussed in terms of them being the cause of an infection instead of poor security design. To be able to discuss poor security choices or similar areas, research would have to be made to see what is being done at a real company for example. Since several of the mentioned attacks in this thesis started off with a phishing email, it can be argued that the human error is the largest threat for being infected by more advanced threats [87]. When it comes to challenges regarding techniques or other security measures for detecting hidden information, it is hard to be sure that the implemented security features always manage to detect suspicious content [53]. Knowing what and where to inspect objects increase this challenge. It might not be feasible or logical for a company to inspect every document being

37 sent in a network due to increased latency or other issues that may arise. This would require a balance which can be hard to find.

5.2.2 Challenges when detecting covert channels The human factor can also be seen as a challenge when it comes to detecting covert channels. For example if a network administrator does not update and maintain the SIEM rules, this may be seen as a human error. This was however not discussed in the results since in terms of covert channels, the discussion is mostly about their ability to fool network devices for instance. As presented in the results, what makes covert channels so hard to detect is the fact that they can be anywhere. Since they use such common protocols that are key for everyday use of the internet, they cannot be limited too much since it will impact the overall use of a network. Add to this that a channel might be encrypted, it can hinder the inspection of the packets sent.

5.2.3 Stegomalware detection techniques The current discussion about detecting stegomalware is much centered around ML [56]. De- tection techniques using ML have shown good initial results and can be argued that ML-based techniques will be a key feature in the future of stegomalware detection. Several ML tech- niques reviewed in this thesis revolved around anomaly detection [56, 82, 83, 84, 85]. This is because steganography is based a lot in trying to appear as an anomaly and therefore if an anomaly is detected, this can lead to detecting information that is embedding. The key is however to be able to detect even very small anomalies since if a covert channel for example is used, then this is most likely not going to appear too much out of the ordinary. One issue with regards to ML is overfitting. Overfitting is when a ML models performance on unseen test data is different from what was observed during the use of training data [98]. In terms of malware detection, this can lead to it missing new malware since its generalization error is large. It is also important the the ML algorithm is to sensitive either since this can lead to too many false positives. Although not mentioned in the results, using "traditional" behavior or signature based malware detection is still common [7]. However, ML techniques have shown promising results which may help to increase the odds of the systems detecting unknown malware. As stated in the results, no malware detection technique can detect all types of malware, be it obfuscated or not. This makes a combination of different techniques the best solution. Even if ML shows positive results, it all depends on how strong the algorithm. If supervised or unsupervised machine learning is better seems to be left to be told in the future. The strength of unsupervised learning is that is can be able to detect completely new stegano- graphic techniques, wheres supervised learning falls short in this department. The aim of this research question was to give a wide and general overview of techniques that are shown to be effective against obfuscated malware. To this aim, the results presented are deemed sufficient with different categories of malware detection being discussed.

38 5.3 What is the state of the art when it comes to defending against stegomalware? The results presented for this research questions shows that there is no specific tool or tech- nique that will defend against being infected by stegomalware. A combined effort including software, people and policies is showed to be the most effective way to deal with stegomal- ware. Hopefully, with these results, a better understanding about how to become more safe towards the threat of stegomalware will be achieved. The fact that there is no single answer when it comes to the state of the art of defending against stegomalware, shows that it is hard to find that "one" solution to be able to defend against stegomalware. This partly due to the various ways malware can hide as well as the constant development of new malware [40]. Content Threat Removal(CTR) that was mentioned in the results has to the best of my knowledge not had any articles published testing the efficacy. This makes the claims for this type of tool based on what the developers state [89, 90]. To be sure that it is as efficient as is stated, tests using the tool would have to be conducted. Eliminating or limiting covert channels can be done during protocol design, as mention in Section 3.1.2. If a protocol was designed in a manner that does not make it possible for use in covert channels, this would eliminate the need to detect or limit some covert channels. This can be achieved by removing unnecessary redundancies or unclear specifications for example [10, p. 211]. There is however existing methods for detecting covert channels when designing protocols, but it is unclear if they are used in practice. Therefore identifying possible covert channels are at best ad hoc, or not attempted at all [10, p. 212]. However, since this is not necessarily something that everyday companies or people will be doing, it is not considered a precaution that can actively be taken for defending against stegomalware. Since this thesis performed a literature review, no defences were tested. If a practical element was performed then the results may have given a more decisive answer as to the effectiveness of CTR. Furthermore, contacting cybersecurity firms may have given a differ- ent answer by looking at how they defend themselves against stegomalware in a real-world scenario. As stated in the beginning of the chapter, there was a lack of information found on the topic of defending against stegomalware. Nevertheless, the results presented gave a somewhat satisfactory answer to this research question.

5.4 How common is the use of stegomalware in modern attacks? The results for RQ4 gives a picture of the current landscape when it comes to stegomalware. It shows that threat actors are cleaver and always requires the security community to keep up to date with the latest trends in information hiding. CUING [4] states that there are probably many more threats and attacks where stegomalware is in play. The fact that the SolarWinds attack infected the victims a year prior to it being discovered, gives more credibility to the fact that more attacks are yet to be discovered [76]. The report from Red Canary presented a bit more concrete values when it came to the occurrences of obfuscation techniques. However, I was not able to figure out the amount of customers that Red Canary had during the creation of their threat report leading to me not being able to give a number of how many customers were affected by the threats presented in the results. The percentage presented in Red Canary’s report was the only value available. Since there is a limited amount of official reports as to what extent stegomalware is used

39 in modern attacks, it requires finding reported attacks where stegomalware was used as well as other reports that may help complete the puzzle. It is not possible for this thesis to be able to present a number or estimate as to the occurrence of stegomalware in general due to the lack of time and resources available. Even though it was possible to add more reported attacks where stegomalware is used, the belief is that it would not be able to give a more clear picture than the one presented in this thesis since it would still not make it possible to present an estimate of the occurrence of stegomalware. Nevertheless, the result presented did not fully satisfy the research question due to the aforementioned issues. If the methodology was altered to include interviews or questionnaires with security com- panies or with organizations like CUING and SIMARGL, perhaps a better result would have been able to be presented. Even if might not be the top threat today, the odds of it decreasing seem low. Since the whole point of covert channels and other steganographic techniques is to be undetectable, there is most certainly ongoing attacks where such techniques are in play and are yet to be discovered. The techniques used today will continue to evolve and more sophisticated attacks are probable to appear in the near future.

40 6 Discussion

This thesis aimed to examine the most prevalent techniques used for stegomalware, the chal- lenges for detecting it, the state of the art for defending against it as well as the current landscape of the use of stegomalware in attacks. The evidence shows that stegomalware is hard threat to tackle. This is also supported by [86], which show that few malware detection techniques are resistant against obfuscated malware. Furthermore, attacks have been able to evade detection for long period of time which also supports this fact [6,7, 65]. As with most malware, the most crucial part to defending against stegomalware is man- aging to detect it. Due to the fact that there is not single method that can detect all kinds of malware, a combination of methods are needed. Since stegomalware is a relatively "new" phenomena, the research is focused on finding techniques for how to detect hidden malware. This would suggest that the malware itself is not necessarily hard to remove once detected, because if the malware itself would be the hardest part of stegomalware to tackle, the discus- sion would revolve more around how to remove it. Certain malware mentioned in the thesis are variants of already known malware families which would support the hypothesis that the malware in itself is not the challenge, but rather detecting it is [7,8, 51]. When it came to finding information about specific techniques used in attacks, it was at times challenging to find sources that gave a more comprehensive analysis of attacks. When finding sources to answer the third research question, it was found that there was very limited discussion around defending against stegomalware. Most research revolved around the de- tection of stegomalware. This could be due to the fact that it is the detection of stegomalware that requires more effort than eradicating the malware itself. However, detection and defence go hand-in-hand where detection can be seen as one part of the defense. Detection and de- fense were however chosen to be discussed separately to allow for a single clear discussion about the detection techniques proposed in the literature. The work and research done for this thesis focused on threats targeted towards companies or organizations. This partly due to the fact the the use of stegomalware seems to be more prevalent in more advanced attacks such as by Advanced Persistent Threats(APT). It is there- fore not a focus to touch too much upon the impact stegomalware may have on the general public/individuals. This does not however say that individuals may not be victims to this threat. It is shown that many attacks start with phishing emails, something that has had an increase towards individuals during the COVID-19 pandemic [69, 99]. Recommendations for individuals to defend against this threat would therefore be to not interact with emails sent from addresses they do not recognise, not to enable macros on documents unless they are completely sure that they are safe and to keep their operating system and anti-virus software up-to-date. While researching recently reported attacks that used stegomalware, it was often the case that new attacks constantly were reported when viewing different websites. This gave support to the fact that attacks involving stegomalware is most likely increasing, which also was the trend reported by CUING [4]. Due to the lack of data in certain areas such as how certain protocols were used for exam- ple, the results cannot confirm exactly the steganographic bandwidth of all techniques used in the attacks presented. The lack of information did however not impact the generalisability of the findings. Furthermore, due to the limit of time, specific detection techniques were not able

41 to be individually examined deeper to provide a more clear picture of the most effective tech- niques. This would have been preferable to see if any specific techniques, for example ML based techniques, are developed or more effective against certain obfuscation techniques. No matter what would have been preferable, the aim of finding detection techniques was never to go into too much detail, since the results should still be generalizable.

42 7 Conclusion and Future Work

The first research question aimed to study the most prevalent stegomalware techniques as well as their efficacy. The research found several prevalent techniques and demonstrated how these techniques work. The techniques involve embedding information in images, using malicious macros in documents and hiding information in HTTP, DNS, ICMP, TCP and UDP traffic. Furthermore, the efficacy of each technique was discussed based on their undetectability, capacity/bandwidth and robustness. The challenges for detecting stegomalware were investigated for the second research question. The human factor was identified as a large liability. Moreover, the study demon- strated challenges linked to use of common internet protocols in attacks. Detection techniques were identified showing that machine learning techniques show good promise. The state of the art when it comes to defending against stegomalware was the aim for the third research question. Recommendations such as keep security systems up to date as well as training individuals were identified. Content Threat Removal can tackle digital media steganography. Moreover, traffic normalization and limiting known covert channels can help to tackle the threat of covert channels. Lastly, the stegomalware landscape was investigated. It was not possible to give an es- timate of attacks that employed steganographic techniques. CUING reported an increase between 2011 and 2019 and a report from Red Canary reported that 1400 of 20,000 threats they observed used some form of obfuscation. Evidence points to the trend of stegomalware to keep increasing and the sophistication of attacks to increase as well. In conclusion, this work is relevant to help the cyber security industry as well as re- searchers. It provides a thorough overview of the most prevalent techniques used by cy- bercriminals deploying stegomalware. This thesis can be grounds for further research into this topic and moreover data for realising the threat of stegomalware is something to take seriously.

7.1 Scientific contribution Although there are articles discussing stegomalware techniques, this report is distinguished by giving a comprehensive understanding of stegomalware. Both discussing the efficacy of current techniques used in reported attacks as well as theoretical techniques. Furthermore, an extensive review of the current landscape involving stegomalware is provided.

7.2 Future Work Due to this being only a literature review, no testing of techniques or practical review of detection/defence methods used in real-life scenarios were looked at. Therefore, future work could be focused on work with a company(s) to see what methods they use for tackling the threat of stegomalware. Work could also be performed to review protocol design, reviewing theoretical approaches to minimise covert channels during the design phase and research the current work put into this area of research. A more detailed discussion about each malware detection category as well as specific techniques within each category could be made. Preferably, information about what tech-

43 niques that are deployed by malware detection software would have been found which would give an idea of what works in practice.

44 References

[1] C. B. et al, “Mcafee labs threat report,” November 2020. [Online]. Available: https://www.mcafee.com/enterprise/en-us/lp/threats-reports/nov-2020.html

[2] M. B. Pope, M. Warkentin, E. Bekkering, and M. B. Schmidt, “Digital steganogra- phy—an introduction to techniques and tools,” Communications of the Association for Information Systems, vol. 30, Jun 2012.

[3] D. Puchalski, L. Caviglione, R. Kozik, A. Marzecki, S. Krawczyk, and M. Charos, “Stegomalware detection through structural analysis of media files,” Association for Computing Machinery, August 2020.

[4] “Stegware - the latest trend in cybercrime,” Feb. 14 2020, ac- cessed: 10/04-2021. [Online]. Available: https://simargl.eu/blog/technical/ stegware-the-latest-trend-in-cybercrime

[5] F. Johnsson and S. Jajodia, “Exploring steganography: Seeing the unseen,” IEEE, Feb. 1998.

[6] W. Mazurczyk and L. Caviglione, “Information hiding as a challenge for malware de- tection,” Security Privacy, IEEE, vol. 13, pp. 89–93, 03 2015.

[7] L. Caviglione, M. Choras,´ I. Corona, A. Janicki, W. Mazurczyk, M. Pawlicki, and K. Wasielewska, “Tight arms race: Overview of current malware threats and trends in their detection,” IEEE Access, vol. 9, pp. 5371–5396, 2021.

[8] K. Cabaj, L. Caviglione, W. Mazurczyk, S. Wendzel, A. Woodward, and S. Zander, “The new threats of information hiding: The road ahead,” IT Professional, vol. 20, no. 3, pp. 31–39, 2018.

[9] M. Hopkins and A. Dehghantanha, “Exploit kits: The production line of the cybercrime economy?” in 2015 Second International Conference on Information Security and Cy- ber Forensics (InfoSec), 2015, pp. 23–27.

[10] W. Mazurczyk, S. Wendzel, S. Zander, A. Houmansadr, and K. Szczypiorski, Informa- tion Hiding in Communication Networks: Fundamentals, Mechanisms, Applications, and Countermeasures, 1st ed. Wiley-IEEE Press, 2016.

[11] C. Heinz, W. Mazurczyk, and L. Caviglione, “Covert channels in transport layer security,” in Proceedings of the European Interdisciplinary Cybersecurity Conference, ser. EICC 2020. New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3424954.3424962

[12] W. Mazurczyk, K. Powójski, and L. Caviglione, “Ipv6 covert channels in the wild,” in Proceedings of the Third Central European Cybersecurity Conference, ser. CECC 2019. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3360664.3360674

45 [13] K. Szczypiorski, “Steganography in tcp/ip networks and a proposal of a new system - hiccups,” November 2003.

[14] J. Bieniasz, M. St˛epkowska, A. Janicki, and K. Szczypiorski, “Mobile agents for detecting network attacks using timing covert channels,” vol. 25, no. 9, pp. 1109–1130, sep 2019. [Online]. Available: http://www.jucs.org/jucs_25_9/mobile_agents_for_ detecting

[15] T. Sloan and J. Hernandez-Castro, “Dismantling openpuff pdf steganography,” Digital Investigation, vol. 25, pp. 90–96, 2018. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S1742287617303286

[16] L. Caviglione, W. Mazurczyk, M. Repetto, A. Schaffhauser, and M. Zuppelli, “Kernel-level tracing for detecting stegomalware and covert channels in linux environments,” Computer Networks, vol. 191, p. 108010, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1389128621001249

[17] L. Caviglione, M. Gaggero, J.-F. Lalande, W. Mazurczyk, and M. Urbanski, “Seeing the unseen: Revealing mobile malware hidden communications via energy consumption and artificial intelligence,” IEEE Transactions on Information Forensics and Security, vol. 11, pp. 1–1, 01 2015.

[18] P. Johri, A. Mishra, S. Das, and A. Kumar, “Survey on steganography methods (text, image, audio, video, protocol and network steganography),” in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, pp. 2906–2909.

[19] E. M. Rudd, A. Rozsa, M. Günther, and T. E. Boult, “A survey of stealth malware attacks, mitigation measures, and steps toward autonomous open world solutions,” IEEE Communications Surveys Tutorials, vol. 19, no. 2, pp. 1145–1172, 2017.

[20] A. Pawlicka, D. Jaroszewska-Choras, M. Choras, and M. Pawlicki, “Guidelines for stego/malware detection tools: Achieving gdpr compliance,” IEEE Technology and So- ciety Magazine, vol. 39, no. 4, pp. 60–70, 2020.

[21] A. Cheddad, J. Condell, K. Curran, and P. Mc Kevitt, “Digital image steganography: Survey and analysis of current methods,” Signal Processing, vol. 90, no. 3, pp. 727–752, 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S0165168409003648

[22] N. F. Johnson and S. Jajodia, “Steganalysis of images created using current steganogra- phy software,” in Information Hiding, D. Aucsmith, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, pp. 273–289.

[23] M. S. Subhedar and V. H. Mankar, “Current status and key issues in image steganography: A survey,” Computer Science Review, vol. 13-14, pp. 95– 113, 2014. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S1574013714000136

46 [24] M. Kharrazi, T. Sencar, and N. Memon, “Image steganography: Concepts and practice,” vol. 22, 01 2004.

[25] T. Morkel, J. Eloff, and M. Olivier, “An overview of image steganography,” in ISSA, 2005.

[26] N. Hamid, A. Yahya, R. Ahmad, and O. Al-qershi, “Image steganography techniques: An overview,” International Journal of Computer Science and Security, vol. 6, pp. 168– 187, 06 2012.

[27] J. Kose, O. B. Chia, and V. Baboolal, “Review and test of steganog- raphy techniques,” 2020. [Online]. Available: https://deepai.org/publication/ review-and-test-of-steganography-techniques

[28] A. Ker, “Steganalysis of lsb matching in grayscale images,” IEEE Signal Processing Letters, vol. 12, no. 6, pp. 441–444, 2005.

[29] B. W. Lampson, “A note on the confinement problem,” Commun. ACM, vol. 16, no. 10, p. 613–615, Oct. 1973. [Online]. Available: https://doi.org/10.1145/362375.362389

[30] M. Stéphane, “Chapter 10 - compression,” in A Wavelet Tour of Signal Processing (Third Edition), third edition ed., M. Stéphane, Ed. Boston: Academic Press, 2009, pp. 481–533. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ B9780123743701000148

[31] F. Djebbar, B. Ayad, H. Hamam, and K. Abed-Meraim, “A view on latest audio steganography techniques,” in 2011 International Conference on Innovations in Infor- mation Technology, 2011, pp. 409–414.

[32] R. B. Krishnan, P. K. Thandra, and M. S. Baba, “An overview of text steganography,” in 2017 Fourth International Conference on Signal Processing, Communication and Networking (ICSCN), 2017, pp. 1–6.

[33] I.-S. Lee and W.-H. Tsai, “A new approach to covert communication via pdf files,” Signal Processing, vol. 90, no. 2, pp. 557–565, 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0165168409003351

[34] S. G. R. Ekodeck and R. Ndoundam, “Pdf steganography based on chinese remainder theorem,” Journal of Information Security and Applications, vol. 29, pp. 1–15, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S221421261500068X

[35] J. Fridrich, M. Goljan, and R. Du, “Reliable detection of lsb steganography in color and grayscale images.” New York, NY, USA: Association for Computing Machinery, 2001. [Online]. Available: https://doi-org.proxy.lnu.se/10.1145/1232454.1232466

[36] S. Laskar and K. Hemachandran, “A review on image steganalysis techniques for attack steganography,” vol. 3, January 2014.

47 [37] K. Karampidis, E. Kavallieratou, and G. Papadourakis, “A review of image steganalysis techniques for digital forensics,” Journal of Information Security and Applications, vol. 40, pp. 217–235, 2018. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S2214212617300777

[38] C. Burgess, F. Kurugollu, S. Sezer, and K. McLaughlin, “Detecting packed executables using steganalysis,” in 5th European Workshop on Visual Information Processing (EU- VIP). Institute of Electrical and Electronics Engineers (IEEE), Dec. 2014, pp. 101–105, 5th European Workshop on Visual Information Processing, EUVIP 2014 ; Conference date: 10-12-2014 Through 12-12-2014.

[39] S. Johansson and E. Lenngren, “Steganographic embedding and steganalysis evalua- tion,” Kungliga Tekniska Högskolan, 2014.

[40] C. Beek, E. Carroll, M. Cashman, S. Chandana, J. Fokker, M. Gaffney, S. Grobman, T. Holden, T. Hux, D. McKee, L. Munson, C. Palm, T. Polzer, T. Roccia, R. Samani, and C. Schmuger, “Mcafee labs threats report 421,” McAfee, April 2021.

[41] Europol, “Internet organised crime threat assesment(iocta) 2020,” 2020.

[42] M. Egele, C. Kruegel, E. Kirda, H. Yin, and D. Song, “Dynamic spyware analysis.” 01 2007, pp. 233–246.

[43] E. Kirda, C. Kruegel, G. Banks, G. Vigna, and R. Kemmerer, “Behavior-based spyware detection,” 01 2006.

[44] N. Idika and A. Mathur, “A survey of malware detection techniques,” Purdue University, 03 2007.

[45] A. Taha, S. Ramadass, and D. Almomani, “Real time network anomaly detection using relative entropy,” 12 2011.

[46] S. S. Ramaswami, “Using entropy to spot the malware hiding in plain sight,” November 2020, accessed: 13/05-2021. [Online]. Available: https://umbrella.cisco.com/blog/ using-entropy-to-spot-the-malware-hiding-in-plain-sight

[47] P. Chen, L. Desmet, and C. Huygens, “A study on advanced persistent threats,” in Com- munications and Multimedia Security, B. De Decker and A. Zúquete, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014, pp. 63–72.

[48] R. Ross, “Managing information security risk: Organization, mission, and information system view,” 2011-03-01 2011. [Online]. Available: https://tsapps.nist. gov/publication/get_pdf.cfm?pub_id=908030

[49] C. Osborne, “Lazarus hacking group now hides payloads in bmp image files,” April 2021, accessed: 27/04-2021. [Online]. Available: https://www.zdnet.com/ article/lazarus-state-hacking-group-now-hides-payloads-in-bmp-image-files/#ftag= RSSbaffb68

48 [50] H. Jazi, “Lazarus apt conceals malicious code within bmp image to drop its rat,” accessed: 27/04-2021. [On- line]. Available: https://blog.malwarebytes.com/malwarebytes-news/2021/04/ lazarus-apt-conceals-malicious-code-within-bmp-file-to-drop-its-rat/

[51] N. Villeneuve and J. d. Torre, “Fakem rat malware diguised as windows messenger and yahoo! messanger,” 2013. [Online]. Available: https://documents.trendmicro.com/ assets/wp/wp-fakem-rat.pdf

[52] R. Canary, “2021 threat detection report,” 2021. [On- line]. Available: https://redcanary.com/threat-detection-report/?_bt= 433114880577&_bk=cyber%20threats%20and%20security&_bm=b&_bn= g&gclid=Cj0KCQjw4cOEBhDMARIsAA3XDRjBBwrwPy0Fh9bhmm1_ E6NbYyEmVcded9vlyV_6fKDExLVNznML2o4aAk2xEALw_wcB

[53] S. Wiseman, “Stegware – using steganography for malicious purposes,” 12 2017.

[54] Dell SecureWorks Counter Threat Unit™ Threat Intelligence, “Stegoloader: A stealthy information stealer,” June 2015, accessed: 17/04-2021. [Online]. Available: https://www.secureworks.com/research/stegoloader-a-stealthy-information-stealer

[55] J. Segura, “Hiding in plain sight: a story about a sneaky banking trojan,” March 2016, accessed: 04/05-2021. [Online]. Available: https://blog.malwarebytes.com/ threat-analysis/2014/02/hiding-in-plain-sight-a-story-about-a-sneaky-banking-trojan/

[56] CUING, “Simargl: Stegware primer, part 2,” February 2020, accessed: 09/05-2021. [Online]. Available: https://cuing.eu/blog/technical/simargl-stegware-primer-part-2

[57] R. Lakshmanan, “Cybercriminals widely abusing excel 4.0 macro to distribute malware,” The Hacker News, April 2021, accessed: 29/04-2021. [Online]. Available: https://thehackernews.com/2021/04/cybercriminals-widely-abusing-excel-40.html

[58] ——, “Alert — there’s a new malware out there snatching users’ passwords,” April 2021, accessed: 04/05-2021. [Online]. Available: https://thehackernews.com/2021/04/ alert-theres-new-malware-out-there.html

[59] A. Hosseini and A. Chitwadgi, “2020 phishing trends with pdf files,” April 2021. [On- line]. Available: https://unit42.paloaltonetworks.com/phishing-trends-with-pdf-files/

[60] J. Kroustek, “Analysis of banking trojan vawtrak,” March 2015. [On- line]. Available: https://cdn2.hubspot.net/hubfs/4650993/Blog_Content/Avg/Signal/ avg_technologies_vawtrak_banking_trojan_report.pdf

[61] D. Stevens, “Malicious pdf documents explained,” IEEE Security Privacy, vol. 9, no. 1, pp. 80–82, 2011.

[62] Y. Takata, S. Goto, and T. Mori, “Analysis of redirection caused by web-based malware,” Proceedings of the Asia-Pacific Advanced Network, vol. 32, no. 0, p. 53, dec 2011. [Online]. Available: https://doi.org/10.7125%2Fapan.32.7

49 [63] S. Abarca, “An analysis of network steganographic malware,” Ph.D. dissertation, 12 2018.

[64] M. Drzymała, K. Szczypiorski, and M. Urbanski,´ “Network steganography in the dns protocol,” International Journal of Electronics and Telecommunications, vol. 62, 11 2016.

[65] FireEye, “Highly evasive attacker leverages solarwinds supply chain to compromise multiple global victims with sunburst backdoor,” December 2020, accessed: 29/04- 2021. [Online]. Available: https://www.fireeye.com/blog/threat-research/2020/12/ evasive-attacker-leverages-solarwinds-supply-chain-compromises-with-sunburst-backdoor. html

[66] P.-M. Bureau and C. Dietrich, “Hiding in plain sight,” 2015. [Online]. Available: https://www.blackhat.com/docs/eu-15/materials/ eu-15-Bureau-Hiding-In-Plain-Sight-Advances-In-Malware-Covert-Communication-Channels-wp. pdf

[67] L. Macrohon and R. Mendrez, “Pingback: Backdoor at the end of the icmp tunnel,” May 2021, accessed: 08/05-2021. [Online]. Available: https://www.trustwave.com/ en-us/resources/blogs/spiderlabs-blog/backdoor-at-the-end-of-the-icmp-tunnel/

[68] V. Kopeytsev and S. Park, “Lazarus targets defense industry with threatneedle,” Febru- ary 2021.

[69] European Union Agency for Cybersecurity, “The year in review - enisa threat landscape,” European Union Agency for Cybersecurity, Tech. Rep., Oct 2020, accessed: 08/05-2021. [Online]. Available: https://www.enisa.europa.eu/publications/ year-in-review

[70] N. B. Lucena, G. Lewandowski, and S. J. Chapin, “Covert channels in ipv6,” in Privacy Enhancing Technologies, G. Danezis and D. Martin, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp. 147–166.

[71] Productivity Commission Staff Research Note, “On efficiency and effectiveness: some definitions,” 2013.

[72] T. Pevný, M. Kopp, J. Kroustek,ˇ and A. Ker, “Malicons: Detecting payload in favicons,” Electronic Imaging, vol. 2016, no. 8, pp. 1–9, feb 2016. [Online]. Available: https://doi.org/10.2352%2Fissn.2470-1173.2016.8.mwsf-079

[73] SOPHOS, “Sophos 2021threat reportnavigating cybersecurity in an uncertain world,” SOPHOS, Tech. Rep., Nov 2020, accessed: 05/8-2021. [On- line]. Available: https://www.sophos.com/en-us/medialibrary/pdfs/technical-papers/ sophos-2021-threat-report.pdf

[74] P. Reddy and S. Kumar, “Steganalysis techniques: A comparative study,” University of New Orleans Theses and Dissertations, 2007. [Online]. Available: https://scholarworks.uno.edu/td/562/

50 [75] S. Wiseman, “Content security through transformation,” Computer Fraud Security, vol. 2017, no. 9, pp. 5–10, 2017. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1361372317300970

[76] “The solarwinds cyber-attack: What you need to know,” March 2021, accessed: 01/05-2021. [Online]. Available: https://www.cisecurity.org/solarwinds/

[77] Y. Qian, T. Sun, J. Li, C. Fan, and H. Song, “Design and analysis of the covert channel implemented by behaviors of network users,” Security and Communication Networks, vol. 9, no. 14, pp. 2359–2370, 2016. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/sec.1503

[78] I. Barwise, “Digital steganography as an advanced mal- ware detection evasion technique,” September 2020, ac- cessed: 08/05-2021. [Online]. Available: https://z3r0trust.medium.com/ digital-steganography-as-an-advanced-malware-detection-evasion-technique-c70357feb67f

[79] S. Wendzel, W. Mazurczyk, L. Caviglione, and M. Meier, “Hidden and uncontrolled - on the emergence of network steganographic threats,” 10 2014.

[80] A. Cohen, N. Nissim, and Y. Elovici, “Maljpeg: Machine learning based solution for the detection of malicious jpeg images,” IEEE Access, vol. 8, pp. 19 997–20 011, 2020.

[81] I. Corona and M. Marui, “Steganalysis and machine learning: A european answer,” September 2020, accessed: 12/05-2021. [Online]. Available: https: //simargl.eu/blog/technical/steganalysis-and-machine-learning

[82] J. Jackson, G. Gunsch, R. Claypoole, and G. Lamont, “Blind steganography detection using a computational immune system approach: A proposal,” 09 2002.

[83] B. T. McBride, G. L. Peterson, and S. C. Gustafson, “A new blind method for detecting novel steganography,” Digital Investigation, vol. 2, no. 1, pp. 50–70, 2005. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1742287605000046

[84] M. Komisarek, M. Pawlicki, R. Kozik, and M. Choras, “Machine learning based approach to anomaly and cyberattack detection in streamed network traffic data,” Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA), p. 3–19, Mar 2021. [Online]. Available: https://simargl.eu/publications/papers/machine-learning-anomaly-detection

[85] D. Spiekermann and J. Keller, “Unsupervised packet-based anomaly detection in virtual networks,” Computer Networks, vol. 192, p. 108017, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1389128621001286

[86] Aslan and R. Samet, “A comprehensive review on malware detection approaches,” IEEE Access, vol. 8, pp. 1–1, 01 2020.

[87] Kapersky, “The human factor in it security: How employees are making businesses vulnerable from within,” 2017, accessed: 28/05-2021. [Online]. Available: https://www.kaspersky.com/blog/the-human-factor-in-it-security/

51 [88] N. E. Team, “How to educate your employees on cybersecurity,” August 2018, accessed: 02/05-2021. [Online]. Available: https://www.ntiva.com/blog/ how-to-educate-your-employees-on-cybersecurity

[89] J. Phillips, “Deep secure launches content threat removal-as-a-service,” March 2019, accessed: 27/04-2021. [Online]. Available: https://www.intelligentciso.com/2019/03/ 04/deep-secure-launches-content-threat-removal-as-a-service/

[90] V. Team, “What is content threat removal (ctr)?” March 2021, ac- cessed: 22/04-2021. [Online]. Available: https://securityboulevard.com/2021/03/ what-is-content-threat-removal-ctr/

[91] M. Loffe, “Detecting the "next" solarwinds-style cyber attack,” April 2021, accessed: 05/05-2021. [Online]. Available: https://thehackernews.com/2021/04/ detecting-next-solarwinds-attack.html

[92] S. Zander, G. Armitage, and P. Branch, “Covert channels and countermeasures in com- puter network protocols,” IEEE Communications Surveys and Tutorials, vol. 9, pp. 44– 57, 09 2007.

[93] Z. Zorz, “Solarwinds hack investigation reveals new sunspot malware,” Help Net Security, January 2021, accessed: 25/04-2021. [Online]. Available: https: //www.helpnetsecurity.com/2021/01/12/solarwinds-sunspot/

[94] R. Priyanka, “Lazarus malware strikes south korean supply chains,” Cyber Safe, November 2020, accessed: 10/05-2021. [Online]. Available: https://www.cybersafe. news/lazarus-malware-strikes-south-korean-supply-chains/

[95] K. Zanki, “Spotting malicious excel4 macros,” Reversing Labs, April 2021, accessed: 02/05-2021. [Online]. Available: https://blog.reversinglabs.com/blog/ spotting-malicious-excel4-macros

[96] A. Osipov, “The introduction of the jupyter infostealer/backdoor,” November 2020. [Online]. Available: https://blog.morphisec.com/ jupyter-infostealer-backdoor-introduction

[97] Spanning Cloud Apps, “Stegware aka steganography malware – malware of the month, july 2019,” July 2019, accessed: 29/04-2021. [Online]. Available: https://securityboulevard.com/2019/07/ stegware-aka-steganography-malware-malware-of-the-month-july-2019/

[98] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in 2018 IEEE 31st Computer Security Foun- dations Symposium (CSF), 2018, pp. 268–282.

[99] Bitdefender, “2020 consumer threat landscape report,” Bitde- fender, Tech. Rep., Apr 2021, accessed: 12/05-2021. [On- line]. Available: https://www.bitdefender.com/files/News/CaseStudies/study/395/ Bitdefender-2020-Consumer-Threat-Landscape-Report.pdf

52