Security Through Obscurity: Steganography

1 ABSTRACT: This section discusses Steganography at As the information age is growing rapidly length and deals with the different types and data becomes highly valuable and which we practice today along with some of the other principles that are used sensitive, methods need to be discovered to in Steganography and some of the protect and secure sensitive data. One such Steganographic techniques in use today. This is where one can look at the method that transfers data over network nuts and bolts of Steganography and all securely is achieved by Steganography. This the different ways one can use this paper deals with some of the technology. Let’s look at what a theoretically steganographic techniques and detailed perfect secret communication. To look at hiding information in Image and illustrate this concept, consider three TCP/IP protocol that are used frequently in fictitious characters named Amy, Bret and Crystal. Amy wants to send a secret the internet. message (M) to Bret using a random (R) harmless message to create a cover (C) INTRODUCTION which can be sent to Bret without raising "Steganography is the art and science of suspicion. Amy then changes the cover communicating in a way which hides the message (C) to a stego-object (S) by existence of the communication”. This embedding the secret message (M) into basically comes down to using the cover message (C) by using a stego- unnecessary bits in an innocent file to key (K). Amy should then be able to store your sensitive data. The send the stego-object (S) to Bret without techniques used make it impossible to being detected by Crystal. Bret will then detect that there is anything inside the be able to read the secret message (M) innocent file, but the intended recipient because he knows the stego-key (K) can obtain the hidden data. This way, used to embed it into the cover message one can not only hide the message itself, (C). but also the fact that he is sending this message.

GOAL OF STEGANOGRAPHY In contrast to Cryptography, where the enemy is allowed to detect, intercept and modify messages without being able to violate certain security premises guaranteed by a cryptosystem, the goal of Steganography is to hide messages inside other harmless messages in a way that does not allow any enemy to even detect that there is a second message present".

A DETAILED LOOK AT steganography_medium = secret message + STEGANOGRAPHY cover message + key [11]

2 exchanges a stego-key, which makes it As Fabien A.P. Petitcolas points out, "in more susceptible to interception. The a 'perfect' system, a normal cover should benefit to Secret Key Steganography is not be distinguishable from a stego- even if it is intercepted, only parties who object, neither by a human nor by a know the secret key can extract the computer looking for statistical secret message. patterns." In practice, however, this is not always the case. In order to embed Public Key Steganography secret data into a cover message, the It takes the concepts from Public Key cover must contain a sufficient amount Cryptography as explained below. of redundant data or noise. This is Public Key Steganography is defined as because the embedding process a steganographic system that uses a Steganography uses actually replaces public key and a private key to secure this redundant data with the secret the communication between the parties message. This limits the types of data wanting to communicate secretly. The that one can use with Steganography. sender will use the public key during the In practice there are three types of encoding process and only the private steganography protocols used. They key, which has a direct mathematical are Pure Steganography, Secret Key relationship with the public key, can Steganography and Public Key decipher the secret message. Public Key Steganography. Steganography provides a more robust way of implementing a steganographic Pure Steganography system because it can utilize a much It is defined as a steganographic system more robust and researched technology that does not require the exchange of a in Public Key Cryptography. It also has cipher such as a stego-key. This method multiple levels of security in that of Steganography is the least secure unwanted parties must first suspect the means by which to communicate use of steganography and then they secretly because the sender and receiver would have to find a way to crack the can rely only upon the presumption that algorithm used by the public key system no other parties are aware of this secret before they could intercept the secret message. Using open systems such as message. the Internet, this is not the case at all. ENCODING SECRET Secret Key Steganography MESSAGE IN TEXT It is defined as a steganographic system Encoding secret messages in text can be that requires the exchange of a secret a very challenging task. This is because key (stego-key) prior to communication. text files have a very small amount of Secret Key Steganography takes a cover redundant data to replace with a secret message and embeds the secret message message. Another drawback is the ease inside of it by using a secret key (stego- of which text based Steganography can key). Only the parties who know the be altered by an unwanted parties by secret key can reverse the process and just changing the text itself or read the secret message. Unlike Pure reformatting the text to some other form Steganography where a perceived (from .TXT to .PDF, etc.). There are invisible communication channel is numerous methods by which to present, Secret Key Steganography accomplish text based Steganography.

3 Steganography, this field will continue Line-shift encoding to grow at a very rapid pace. To the It involves actually shifting each line of computer, an image is an array of text vertically up or down by as little as numbers that represent light intensities at 3 centimeters. Depending on whether the various points (pixels). These pixels line was up or down from the stationary make up the images raster data. When line would equate to a value that would dealing with digital images for use with or could be encoded into a secret Steganography, 8-bit and 24-bit per pixel message. image files are typical. Both have advantages and disadvantages, Word-shift encoding  8-bit images are a great format to It works in much the same way that line- use because of their relatively shift encoding works, only one can use small size. The drawback is that the horizontal spaces between words to only 256 possible colors can be equate a value for the hidden message. used which can be a potential This method of encoding is less visible problem during encoding. than line-shift encoding but requires that Usually a gray scale color palette the text format support variable spacing. is used when dealing with 8-bit Feature specific encoding images such as (.GIF) because its It involves encoding secret messages gradual change in color will be into formatted text by changing certain harder to detect after the image text attributes such as vertical/horizontal has been encoded with the secret length of letters such as b, d, etc. message. This is by far the hardest text encoding  24-bit images offer much more method to intercept as each type of flexibility when used for formatted text has a large amount of Steganography. The large features that can be used for encoding number of colors (over 16 the secret message. All three of these million) that can be used go well text based encoding methods require beyond the human visual system either the original file or the knowledge (HVS), which makes it very hard of the original files formatting to be able to detect once a secret message, to decode the secret messages. has been encoded. The one major drawback to 24-bit digital images ENCODING SECRET is their large size (usually in MB) MESSAGES IN IMAGE makes them more suspect than Coding secret messages in digital the much smaller 8-bit digital images is widely used in the digital images (usually in KB) when world of today. This is because it can sent over an open system such as take advantage of the limited power of the Internet. the human visual system (HVS). Almost any plain text, cipher text, image and Digital image compression is a any other media that can be encoded into good solution to large digital a bit stream can be hidden in a digital images such as the 24-bit images. image. With the continued growth of There are two types of strong graphics power in computers and compression techniques. They the research being put into image based are,

4 Therefore, the least significant bit can be  Lossless compression is preferred used (more or less undetectably) for when there is a requirement that something else other than color the original information remain information. As you can see, much more intact (as with steganographic information can be stored in a 24-bit images). The original message image file. Disadvantage of using LSB can be reconstructed exactly. alteration are mainly in the fact that it This type of compression is requires a fairly large cover image to typical in GIF and BMP images. create a usable amount of hiding space. Even now a day, uncompressed image of  Lossy compression, while also 800 x 600 pixels are not often used on saving space, may not maintain the Internet, so using these might raise the integrity of the original suspicion. Another disadvantage will image. This method is typical in arise when compressing an image JPG images and yields very good concealing a secret using a lossy compression. compression algorithm. The hidden message will not survive this operation and is lost after the transformation. The popular digital image encoding techniques used today are least significant bit (LSB) encoding, masking Masking and filtering & filtering, Transformation, spread These techniques are usually restricted spectrum steganography, statistical to 24 bits or grayscale images; take a steganography, distortion, and covers different approach to hiding a message. generation steganography. The following These methods are effectively similar to are some of these techniques. paper watermarks, creating markings in an image. This can be achieved for Least significant bit (LSB) encoding example by modifying the luminance of It is by far the most popular of the parts of the image. While masking does coding techniques used for digital change the visible properties of an images. By using the LSB of each byte image, it can be done in such a way that (8 bits) in an image for a secret message, the human eye will not notice the one can store 3 bits of data in each pixel anomalies. Since masking uses visible for 24-bit images and 1 bit in each pixel aspects of the image, it is more robust for 8-bit images. than LSB modification with respect to Logic: A 24-bit bitmap will have 8 bits compression, cropping and different representing each of the three color kinds of image processing. values (red, green, and blue) at each pixel. If we consider just the blue there Transformations will be 28 different values of blue. The A more complex way of hiding a secret difference between say 11111111 and inside an image comes with the use and 11111110 in the value for blue intensity modifications of discrete cosine is likely to be undetectable by the human transformations. Discrete cosine eye. If we do it with the green and the transformations (DST)), are used by the red as well we can get one letter of JPEG compression algorithm to ASCII text for every three pixel. transform successive 8 x 8 pixel blocks of the image, into 64 DCT coefficients

5 each. It follows Jsteg algorithm Output: steganographic image containing (D.Upham) used JPEG image format. message According to Jsteg algorithm, while data left to embed do Replace sequentially the least- get next DCT coefficient from significant bit of discrete cosine cover image transform coefficients with the message if DCT . 0 and DCT . 1 then data . get next LSB from Logic: The secret data is inserted into the message cover image in the DCT domain. The replace DCT LSB with signature (secret message) DCT message bit coefficients are encoded using a lattice end if coding scheme before embedding. Each insert DCT into block of cover DCT coefficients is first steganographic image checked for its texture content and the end while signatured codes are appropriately Although a modification of a single DCT inserted depending on a local texture will affect all 64 image pixels, the LSB measure. Experimental results indicate of the quantized DCT coefficient can be that high quality embedding is possible, used to hide information. Lossless with no visible distortions. Signature compressed images will be suspectible images can be recovered even when the to visual alterations when the LSB are embedded data is subject to significant modified. This is not the case with the lossy JPEG compression. above described method, as it takes place in the frequency domain inside the Each DCT coefficient F (u, v) of an 8 x image, instead of the spatial domain and 8 block of image pixels f(x, y) is given therefore there will be no visible changes by: to the cover image.

In addition to DCT, images can be processed with Fast Fourier transform (FFT). FFT is "an algorithm for computing the Fourier transform of a set where C(x) = 1/√2 when x equals 0 and of discrete data values". The FFT C(x) = 1 otherwise. After calculating the expresses a finite set of data points in coefficients, the following quantizing terms of its component frequencies. It operation is performed: also solves the identical inverse problem of reconstructing a signal from the frequency data. Thus simple logic for encoding and decoding using transforms is hiding the data. The steps are to take the DCT or wavelet transform of the where Q (u, v) is a 64-element cover image and find the coefficients quantization table. A simple pseudo- below a specific threshold. Replace these code algorithm to hide a message inside bits with bits to be hidden (for example, a JPEG image could look like this: use LSB insertion) and then take the inverse transform and store it as a Input: message, cover image regular image.

6 Recovering the data To extract the There are also other methods that are not hidden data take the transform of the discussed in this paper which are of less modified image and find the coefficients utility over the above topics. below a specific threshold. Extract bits of data from these coefficients and ENCODING INFORMATION combine the bits into an actual message. IN A TCP/IP HEADER The TCP/IP header contains a number of Patchwork areas where information can be stored Patchwork is a statistical technique that and sent to a remote host in a covert uses redundant pattern encoding to manner. Take the following diagrams embed a message in an image. The which are textual representations of the algorithm adds redundancy to the hidden IP and TCP headers respectively: information and then scatters it IP Header (Numbers represent bits of throughout the image. data from 0 to 32 and the relative position of the fields in the datagram) Logic: A pseudorandom generator is used to select two areas of the image (or patches), patch A and patch B. All the pixels in patch A is lightened while the pixels in patch B are darkened. In other words the intensities of the pixels in the one patch are increased by a constant value, while the pixels of the other patch are decreased with the same constant value. The contrast changes in this patch subset encodes one bit and the changes Fig 5.1 Basic IP Header Structure [4] are typically small and imperceptible, TCP Header (Numbers represent bits of while not changing the average data from 0 to 32 and the relative luminosity. position of the fields in the diagram.)

A disadvantage of the patchwork approach is that only one bit is embedded. One can embed more bits by first dividing the image into sub-images and applying the embedding to each of them. Fig 5.2 Basic TCP header structure [4] The advantage of using this technique is Logic: Within each header there are that the secret message is distributed multitudes of areas that are not used for over the entire image, so should one normal transmission or are "optional" patch be destroyed, the others may still fields to be set as needed by the sender survive. This however, depends on the of the datagrams. An analysis of the message size, since the message can areas of a typical IP header that are only be repeated throughout the image if either unused or optional reveals many it is small enough. If the message is too possibilities where data can be stored big, it can only be embedded once. and transmitted.

7 For general purposes, this paper focuses 1207959552 is ISN. Dividing it by on encapsulation of data in the more 65536*256 gives 72(i.e., „H.) mandatory fields. Because these fields Because of the sheer amount of are not as likely to be altered in transit as information any one can represent in a say the IP or TCP options fields which 32 bit address space (4,294,967,296 are sometimes changed or stripped off numbers), the sequence number makes by packet filtering mechanisms or an ideal location for storing data. Aside through fragment re-assembly. They are from the obvious example given above, - The IP packet identification field. a number of other techniques are used to - The TCP initial sequence number field. store information in either a byte - The TCP acknowledged sequence fashion, or as bits of information number field. represented through careful Hence data can be placed into these manipulation of the sequence number. fields. Though the ASCII code of The simple algorithm of the covert tcp character can be placed simply, it will program takes the ASCII value of our not look innocent thus some special data and converts it to a usable sequence techniques are used to encode and number Also there are other methods for decode for much safety. hiding data in TCP/IP header that may vary depending on the type of Manipulating IP packet identification field application and requirement. Also data The identification field of the IP protocol can be hidden in audio and video files helps with re-assembly of packet data by which are not discussed in this paper. remote routers and host systems. Its And they are also widely used in open purpose is to give a unique value to environment systems. packets so if fragmentation occurs along a route, they can be accurately re- APPLICATIONS assembled. In the following example, the The three most popular and researched lines below show a tcp dump uses for steganography in an open representation of the packets on a systems environment are covert network between two hosts channels, embedded data and digital "nemesis.psionic.com" and watermarking. "blast.psionic.com". This is one of the  Covert channels in TCP/IP packets received during transmission involve masking identification which has character „H. in its IP packet information in the TCP/IP identification field. headers to hide the true identity of one or more systems. This can Manipulating Initial Sequence Number field be very useful for any secure (ISN) communications needs over open The Initial Sequence Number field (ISN) systems such as the Internet of the TCP/IP protocol suite enables a when absolute secrecy is needed client to establish a reliable protocol for an entire communication negotiation with a remote server. It is process and not just one similar to above but here it has 32 bits document as mentioned next. field, hence it serves as a perfect medium for transmitting clandestine  Embedding Data using data. Consider following line. Here containers (cover messages) is by far the most popular use of

8 Steganography today. This to decide on which steganographic method of Steganography is very algorithm to use, he would have to useful when a party must send a decide on the type of application he want top secret, private or highly to use the algorithm for and if he is sensitive document over an open willing to compromise on some features systems environment such as the to ensure the security of others. Internet. By embedding the hidden data into the cover message and sending it, you can REFERENCES gain a sense of security by the [1]. Hide and Seek: An Introduction to fact that no one knows you have Steganography - Niels Provos and Peter sent more than a harmless Honeyman message other than the intended URL: http:// recipients. niels.xtdnet.nl/papers/practical.pdf  Digital watermarking is usually [2]. Johnson, Neil F., “Steganography”, used for copy write reasons by 2000, URL: companies or entities that wish to protect http://www.jjtc.com/stegdoc/index2.html their property by either embedding their [3]. Steganography - Wikipedia, the free trademark into their property or by encyclopedia_files concealing serial numbers/license URL: information in software, etc. Digital http://en.wikipedia.org/wiki/Steganograp watermarking is very important in the hy detection and prosecution of software [4]. Embedding Covert Channels into pirates/digital thieves. TCP/IP - Steven J. Murdoch and Stephen Lewis URL: http://www.cl.cam.ac.uk/users/ CONCLUSION Although only some of the main image {fsjm217, srl32g}/ steganographic techniques were [5]. Krenn, R., “Steganography and discussed in this paper, one can see that Steganalysis”, there exists a large selection of URL: approaches to hiding information in http://www.krenn.nl/univ/cry/steg/article images. All the major image file formats .pdf have different methods of hiding [6]. Rowland, C.H.: Covert channels in messages, with different strong and the TCP/IP protocol suite. First Monday weak points respectively. Where one 2 (1997) URL: technique lacks in payload capacity, the http://www.firstmonday.org other lacks in robustness. For example, the patchwork approach has a very high level of robustness against most type of attacks, but can hide only a very small amount of information. Least significant bit (LSB) in both BMP and GIF makes up for this, but both approaches result in suspicious files that increase the probability of detection when in the presence of a warden. Thus for an agent

9