A HIGH CAPACITY DATA-HIDING SCHEME IN LSB-BASED IMAGE

STEGANOGRAPHY

A Thesis

Presented to

The Graduate Faculty of The University of Akron

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

Rajanikanth Reddy Koppola

May, 2009 A HIGH CAPACITY DATA-HIDING SCHEME IN LSB-BASED IMAGE

STEGANOGRAPHY

Rajanikanth Reddy Koppola

Thesis

Approved: Accepted:

______Advisor Dean of the College Dr. Xuan-Hien Dang Dr. Chand Midha

______Co-Advisor Dean of the Graduate School Dr. Yingcai Xiao Dr. George R. Newkome

______Committee Member Date Dr. Zhong-Hui Duan

______Department Chair Dr. Wolfgang Pelz

ii ABSTRACT

The phenomenal growth of e-commerce applications in the World Wide Web requires the need to increase the security of data communications over the Internet, especially for highly sensitive document transfer. Steganography techniques were introduced and developed to provide security to these applications.

Steganography is a digital technique for hiding secret information into some form

of media, such as image, audio or video. Steganography has evolved into a practice of

concealing data in larger file in such a way that others cannot suspect the presence of a

hidden message. This technique is generally used to maintain the confidentiality of

valuable information to protect data from possible theft or unauthorized viewing.

In this work, we present a new LSB-based steganography technique to embed

large amount of data in RGBA images while keeping the perceptual degradation to a

minimum. We propose to use the YIQ color space model to transform the RGB value of

the secret image pixel into three separate components, which are then embedded into the least significant bits of each color pixel in the cover image, as well as in the alpha . Results obtained from simulation demonstrate that it is possible to attain

practical hiding capacities up to 100% of the cover image while maintaining acceptable

image quality.

iii ACKNOWLEDGEMENTS

I would like thank my advisor, Dr.Xuan-Hien Dang for her constant

support and invaluable guidance during this work. I am grateful to her for offering me an

opportunity to work under her. This thesis work would not have been possible without

her constant help and support

I would like to thank Dr Xiao, my Co-Advisor, and Dr Duan, my Committee

member, for their time and constant support throughout my course work.

I would also like to thank Dr.Wolfgang Pelz for his advising throughout my

course of study.

I would like to dedicate this thesis to my parents who are the first teachers of my life. Without their encouragement, love and support, I would not have been able to reach this stage of my life. I am forever indebted to them for the sacrifices they made to help to achieve this success.

iv TABLE OF CONTENTS

Page

LIST OF TABLES...... vii

LIST OF FIGURES ...... viii

CHAPTER

I INTRODUCTION ...... 1

II. BACKGROUND AND RELATED WORK...... 7

2.1 Definitions and Terminologies ...... 7

2.2 Data Hiding Techniques ...... 7

2.2.1 Injection ...... 8

2.2.2 Substitution ...... 8

2.2.3 Generation...... 9

2.3 Substitution Algorithms...... 9

2.3.1 Spatial Domain Algorithm...... 10

2.3.2 Tranform Domain Algorithm...... 11

2.4 YIQ Color Model...... 12

2.5 BMP File Format ...... 13

2.5.1 Header...... 14

2.5.2 Information Header...... 15

2.5.3 Optional ...... 16

v 2.5.4 Image Data...... 17

2.6 Alpha Channel ...... 17

2.6.1 RGBA Color Space...... 18

2.7 Existing Techniques on Hiding Large Amount of Data ...... 20

III. DESIGN AND IMPLEMENTATION ...... 22

3.1 Design ...... 22

3.2 Algorithm and Implementation...... 25

3.2.1 Data Hiding Algorithm ...... 25

3.2.2 Extraction Algorithm ...... 28

IV. METRICS, RESULTS AND DISCUSSIONS ...... 31

4.1 Performance Metrics...... 31

4.2 Experimental Setup...... 33

4.3 Amount of Secret Data Hidden in Cover Images ...... 37

4.4 Euclidean Distance of Cover and Stego Images...... 39

4.5 Brightness Information of Cover and Stego images...... 40

4.6 PSNR of Cover and Stego Images...... 42

4.7 Disadvantage of the Proposed Technique...... 43

V. CONLUSION...... 48

5.1 Conclusion ...... 48

5.2 Future Work...... 49

REFERENCES ...... 50

vi LIST OF TABLES

Table Page

2.1 Header...... 15

2.2 Information Header...... 16

2.3 Compression Information ...... 16

4.1 Dataset of Images Used in Experiments ...... 33

4.2 Amount of Data Hidden in Stego Images...... 38

4.3 PSNR Table of the Cover and Stego Images...... 47

vii LIST OF FIGURES

Figure Page

2.1 Image Steganography System...... 9

2.2 Image Header...... 14

2.3 Desktop Transparent Image ...... 19

2.4 Composite Transparent Image ...... 20

3.1 Proposed Hiding Procedure ...... 25

3.2 Pseudo Code for the Proposed Embedding Algorithm...... 28

3.3 Image Extraction System ...... 29

4.1 Blue Hills ...... 34

4.2 Sunset...... 34

4.3 Flower ...... 35

4.4 Lena...... 36

4.5 Map ...... 37

4.6 Amount of Data Hidden in Cover Images ...... 38

4.7 Euclidean Distance of Cover Image and Secret Images ...... 40

4.8 Brightness Information of Cover and Stego Images...... 41

4.9 Cover Image...... 41

4.10 Stego Image ...... 42

4.11 Secret Image...... 44

viii 4.12 Retrieved Secret Image...... 45

4.13 Euclidean Distance of Original Secret Image and Retrieved Secret Image...... 46

4.14 Brightness Information of Original Secret Image and Retrieved Secret Image ...... 46

4.15 PSNR of the Cover and Stego Image...... 47

ix CHAPTER I

INTRODUCTION

Constantly communicated through the Internet are flows of information generated from many diverse applications such as e-commerce transactions, audio and video streaming or online chatting. The security of such data communication, which is required and vital for many applications nowadays, has been a major concern and ongoing topic of study given that the Internet is by design open and public in nature. Many techniques have been proposed for providing a secure transmission of data. Data encryption and information hiding techniques have become popular and generally complement each other. Whereas encryption transforms data into seemingly meaningless bits, called ciphertext, through the use of sophisticated and robust algorithm, information hiding [1] is the process of concealing messages in such a way that no one apart from the sender and the intended receiver even knows that there is a hidden message. The word steganography is of Greek origin which means “covered or hidden writing” [2]. The technique has been used in ancient times where secret messages were tattooed on the shaven heads of the messengers. These messengers were sent away after their hair grew up and were later shaved again to recover the messages.

1 The general idea of hiding secret information in media has a wider range of applications that go beyond steganography. For example, an image printed on a document could be annotated by metadata that could lead a user to its higher resolution. Due to the high proliferation of digital images and the high degree of redundancy present in digital images, there is an increased interest in the usage of images as the cover object in steganography. The Least-Significant-Bit (LSB) technique is one of the most widely used scheme for image steganography. This technique involves the modification of the LSB planes of the images. In this technique, the message is stored in the LSB of the pixels which could be considered as random noise. Therefore altering them does not significantly affect the quality of the cover image. Variations of the LSB algorithms include one or more LSB bits. The motivation for this study is to provide security to confidential RGB images such as maps or sensitive signed documents. The basic principle of steganography is to hide the secret information in the cover object, which can be a digital medium such as image, audio or video file, to obtain a stego file that has secret information hidden in it.

The different types of steganography techniques are substitution, transform domain, spread spectrum, statistical and distortion techniques and cover generation techniques. Substitution techniques replace the least significant bits of each pixel in the cover file with bits from the secret document. The transform domain technique hides secret information in the transform space (like frequency domain) by modifying the least significant coefficients of the cover file. Most research in the category of transform domain embedding is focused on taking advantage of redundancies in Discrete Cosine

Transform (DCT). This technique is mostly used for JPEG images in order to compress

2 images. Changing a large number of coefficients does not produce any visible alterations

but incurs a large amount of changes in compression rates. Therefore the embedding

capacity of the DCT technique is less compared to LSB technique. Spread spectrum

techniques spread hidden information over different bandwidths. Even if parts of the

message are removed from several bands, there would still be enough information present

in other bands to recover the message. Statistical techniques change several statistics of

the cover file and then split it into blocks where each block is used to hide one message

bit. The cover block is modified when message bit is ‘1’. Distortion techniques exploit

signal distortion to hide information. For example the sender applies a sequence of modifications to the cover file which corresponds to the secret information. Then the receiver measures the differences between the original cover and the distorted cover images to detect the sequence of modifications and consequently recover the secret message. Cover generated techniques different from the other steganography techniques.

Typically a cover object is chosen to hide the secret message but this technique creates a cover object for the purpose of hiding the information, such as transforming secret message bits into sentences by selecting words out of the dictionary for example.

Two important properties of steganographic technique are perception and payload

[3]. Steganography generally exploit human perception because human senses are not trained to look for file that has hidden information inside of them. Therefore steganography disguises information from people who try to hack them. Payload is the amount of information that can be hidden in the cover object.

Many steganographic techniques have been introduced to increase the payload. S-

Tools is one of the popular online steganographic tool based on the LSB technique. It

3 hides data in the least significant bit of each color pixel in the image. The main

disadvantage of this technique is that it can hide secret message of only 12% of the cover

image data but maintains a very good perceptual image quality. In 2003, a new spatial

domain technique called Bit Plane Complexity Segmentation (BPCS)[4] was proposed

based on Most Significant Bit technique. It hides data in higher bit planes of the cover

image. This technique could not hide large amount of data because changing most

significant bits can cause significant changes in perception. In 2005, Yeuan-Kuen Lee and Ling-Hwei Chen [5] proposed a technique based on LSB algorithm. They used the properties of image contrast and luminance to hide the data in the 4 lower bits of the cover image pixels, which showed good results in terms of perception. In 2005, Seppanen

Makela and Keskinarkaus [6] proposed a new algorithm to hide large amount of data, which has the advantage of hiding data in the 6 LSB bits of the cover image with lower level of noise. They could achieve hiding capacity of up to 60% of the cover image data.

In 2007, Nameer [7] proposed an algorithm to improve the efficiency of the payload. The

advantage of the technique is that it has high level of security implemented in it while

hiding the factors of data value in 4 LSB bits of the cover image. All these algorithms

show how steganography techniques have improved to extend hiding capacity to a very

high payload.The most obvious limitation to these techniques is that the cover image

must be very large compared to the secret information. We can hide large amount of

information in multiple files but it could lead to suspicion. Therefore it is very important

to use only one image file to hide the entire secret information.

In this work, we present a new steganographic technique to embed large amount

of data in RGBA images while keeping the perceptual degradation to a minimum level

4 This technique allows hiding an uncompressed color image in an uncompressed color

image. Our motivation to hide images in images is to provide security to images that

contain confidential information. For example sensitive documents can be scanned and

embedded into an image which can be then sent confidentially using this technique. We

also note that NTSC (National Television System Committee) broadcasts TV signals

using the YIQ color system. They broadcast the YIQ components as the signal to the

televisions which then reconverts them to RGB color system. The black and white TV’s

use the Y component to project the picture onto the screen while the color television use

all the three YIQ components. The proposed technique is similar to the transmission done by the NTSC system. The major challenge here is to increase the hiding capacity to

almost the same size as the cover image. We propose to utilize the YIQ color model

where Y is the grayscale value of the image while I and Q are the color components. We

transform the RGB pixel value of the secret image into YIQ color space and then use the

LSB technique to hide the data in the LSB bits of the cover image. In this technique we

use YIQ color model to hide the data in the regions where human eyes cannot perceive

the change in perceptions (color blindness). We use about 13 LSB bits of the cover image

to hide the transformed grayscale value of the secret image in the cover image. This

causes change in the quality of the cover image but results show good perception is

maintained due to the hiding of data in the regions where human eye cannot spot the

changes in the image quality like the blue color. We also propose to utilize the alpha

value of the cover image to hide the data. In almost all of the previous techniques, the

alpha value has been ignored or not used to hide data because most of the images in the

past do not have alpha value. Only PNG images have alpha component that describes the

5 of the image. But these days, a significant amount of “transparent” or

layered images can be found posted online therefore in this work we also utilize the least

significant bits of the alpha value to embed data. Our experimental results showed

changing the 3 LSB bits of the alpha value does not cause significant changes in the

perception of the image. We demonstrated that it is possible to attain the practical hiding

capacity of up to 100% of the cover image size. This technique holds good against visual

attacks but the disadvantage is that it is detectable against statistical attacks because we

are using 13 LSB bits of the cover image. However this can be easily overcome by

applying transform domain and compression technique to increase the security of the

cover image [8].

This thesis is organized as follows. In Chapter 2, we present a brief background to

the steganography technique. The proposed scheme has been detailed in chapter 3 and

analysis of results is presented in chapter 4. Finally, chapter 5 provides conclusion and future work to improve the proposed technique

6 CHAPTER II

BACKGROUND AND RELATED WORK

2.1 Definitions and Terminologies

Steganography techniques aimed at secretly hiding data in a multimedia carrier such as text, audio, image or video, without raising any suspiscion of alteration to its contents. The original carrier is referred to as the cover object. In this work, we will mainly focus on image steganography. Therefore, the term cover object now becomes cover image. Figure 2.1 illustrates a basic information hiding system in which the embedding technique takes a cover image and a secret image as inputs and produces as output a stego image, which is the seemingly unchanged cover image with the embedded data. The stego image may be sent over the communication links to the receiver who can then carry out the extraction procedure to retrieve the secret message from the stego image.

2.2 Data Hiding Techniques

There are three different approaches that can be used to hide information in a cover object: injection, substitution and generation.

7 2.2.1 Injection

The data can be hidden in sections of a file that are ignored by the processing

application using injection technique[14]. Therefore file bits that are relevant to an end-user are not modified—leaving the cover file perfectly usable. For example, we can add additional harmless bytes in an executable or binary file. Because those bytes don't affect the process, the end-user may not even realize that the file contains additional hidden information. However, using an insertion technique changes file size according to the amount of data hidden and therefore, if the file looks unusually large, it may arouse suspicion.

2.2.2 Substitution

Substitution technique is used to replace the least significant bits of information that determine the meaningful content of the original file with new data in a way that

causes the least amount of distortion. The main advantage of this technique is that the cover file size does not change after the execution of the algorithm. On the other hand, this approach has at least two drawbacks. First, the resulting stego object may be adversely affected by quality degradation—and that may arouse suspicion. Second, substitution limits the amount of data that you can hide to the number of insignificant bits

in the file.

8 2.2.3 Generation

Unlike injection and substitution, generation techniques [15] do not require an existing cover file. This technique generates a cover file for the sole purpose of hiding the message. The main flaw of the insertion and substitution techniques is that people can compare the stego object with any pre-existing copy of the cover object (which is supposed to be the same object) and discover differences between the two. We will not have that problem when using a generation approach, because the result is an original file, and is therefore immune to comparison tests.

Cover image Noise/Distortion Cover image

Data-hiding algorithm Extraction algorithm

Secret image Secret image

Figure 2.1 Image Steganography System

2.3 Substitution Algorithms

There is an increased interest in using digital images as cover objects for the purpose of steganography because of the proliferation of digital images over the Internet

9 and given the high degree of redundancy present in a digital representation of an image

(despite compression). There has been a number of image steganography technique algorithms based on the substitution approach. They can be categorized into two types: spatial domain techniques and transform domain techniques. In the spatial domain approach, the cover image pixels are directly used to inscribe bits of the secret data

whereas in the frequency domain, the cover image first undergoes a transformation into its frequency domain and then its transformed coefficients are altered to embed the secret information.

2.3.1 Spatial Domain Algorithm

Spatial domain algorithms embed data by substituting carefully chosen bits from the cover image pixels with secret message bits. LSB-based techniques are the most widely known steganography algorithms, which work by replacing the least significant bits of an image pixel. These modifications could be interpreted as random noise, which should not have any perceptible effect on the image. That is usually an effective technique in cases where the LSB substitution does not cause significant quality degradation, such as in 24-bit . Some algorithms change LSB of pixels visited in a random walk, others modify pixels in certain areas of images, or simply increment or decrement of the pixel value [12]. Our proposed technique is based on LSB technique.

For example, to hide the letter "a" (ASCII code 97 that is 01100001) inside eight bytes of a cover, we set the LSB of each byte like this:

10010010 01010011 10011011 11010010

10 10001010 00000010 01110010 00101011

The application decoding the cover reads the eight Least Significant Bits of those

bytes to re-create the hidden byte—that is 0110001—the letter "a."

2.3.2 Transform Domain Algorithm

Transform domain techniques[13] hide data in mathematical functions that are in

compression algorithms. Discrete Cosine Transform (DCT ) technique is one of the commonly used transform domain algorithm for expressing a waveform as a weighted sum of cosines. The data is hidden in the image files by altering the DCT coefficient of the image. Specifically, DCT coefficients which fall below a specific threshold are replaced with the secret bits. Taking the inverse transform will provide the stego image.

The extraction process consists in retrieving those specific DCT coefficients.

Our proposed technique is based on LSB technique which will replace more than one bit from each pixel to hide secret data. But the security of the secret message can be enhanced by combining the Least Significant Bit Technique (LSB), Discrete Cosine

Transform (DCT) and compression technique [8]. The LSB technique is used hide the secret image bits in the cover image to obtain the stego image. The stego image is transformed from spatial domain to the frequency domain using DCT. And finally quantization and runlength coding algorithms[8] can be used for compressing the stego

image to enhance the security.

11 2.4 YIQ Color Model

YIQ color model was introduced in 1940’s [14] and is used by the U.S.

Commercial Color Television Broadcasting. It is a recoding of RGB for transmission efficiency and for download compatibility for black and white television. It is transmitted using the NTSC (National Television System Committee) system [14]. They transmitted

only one monochrome video signal to both the black and white as well as color

televisions. Therefore it is required for them to add color to the monochrome video

signal. The first step for them was to analyze and quantify the properties of the human

perception. The committee International Eclarge (CIE) was established to define an

average human observer. The human eye is most sensitive to green or yellow light and

least sensitive to red or blue lights. It was found that the monochrome resolution of the

eye is much greater than the color resolution. As details become very small, all the eye

can discern is the changes in the brightness of the color. Beyond a certain level of detail,

color cannot be distinguished and therefore the human eye becomes color blind.

Colors in an image can be converted to a shade of gray by calculating the effective

brightness or luminance of the color and using this value to create a shade of gray that

matches the desired brightness. The effective luminance of the pixel is calculated with the

following formula

12 The Y component of YIQ is luminance component of the color TV signal that is

shown on black and white televisions. The chromaticity is encoded in I and Q

components. The colors specified by YIQ model solve the major problem with the signal being prepared for broadcast television. Two different colors of the two adjacent pixels appear to be different but when converted to YIQ and viewed on the monochrome monitor appears to be same. This can be solved by specifying the two colors with different Y values. This model exploits two useful properties of the human visual system.

First, the system is more sensitive to changes in luminance than to changes in hue and

saturation; that is our ability to discriminate spatially monochrome information. This

suggests that more bits of bandwidth should be used to represent Y than are used to

represent I and Q, so as to provide higher resolution in Y. Secondly, objects that cover a

very small part of our field of view produce a limited color sensation, which can be

specified adequately with one rather than two color dimensions. This suggests that either

I or Q can have lower bandwidth. The NTSC encoding of YIQ into a broadcast signal

uses these properties to maximize the amount of information transmitted in a fixed

bandwidth.

2.5 BMP File Format

The BMP file format is an image file format used to store digital images

[16]. In uncompressed bmp files and many other bitmap file formats, image pixels are

stored with a of 1,4,8,16,24 or 32 bits per pixel. An alpha channel for

transparency may be stored in a separate file or in fourth channel that converts 24 bit

13 images to 32 bits per pixel. Uncompressed bitmap files such as BMP files are typically much larger than compressed image file formats for the same image. For example an image of 1058 * 1058 pixels in png format occupies about 287.65 KB while in 24 bit

BMP file it occupies about 3358KB. Uncompressed formats are generally unsuitable for transferring images on the internet or other slow capacity media.

A BMP image file structure consists of either three or four parts as shown in figure 2.2 [18]. The first part is header, the second is the information header, the third is optional palette and the fourth one is all the pixel data. The position of the image data with respect to the sart of the file is contained in the header. The information about the image such as image width or height, the type of compression, the number of the colors is contained in the information header.

HEADER INFORMATION HEADER OPTIONAL PALETTE IMAGE DATA

14 bytes 40 bytes

Figure 2.2 Image Header

2.5.1 Header

The header consists of two fields i.e. type and offset. Type field is used to do a simple check whether the file is a bmp file or not. The offset field gives the number of

14 bytes before the actual pixel data. This Block of bytes is at the beginning of the file and is

used to identify the file. An application reads this block in the file to ensure that the file is

actually a BMP file and is not damaged. The figure 2.1 [16] shows the header

information of the image.

Table 2.1 Header

Offset# Size Purpose 0 2 The number used to identify the BMP file 2 4 The size of the BMP file in bytes. 6 2 Reserved; actual value depends on the application that creates the image 8 2 Reserved; actual value depends on the application that creates the image 10 4 The offset, i.e. starting address of the byte where the bitmap data can be found.

2.5.2 Information Header

Table 2.2 shows the header information which consists of four fields. They are

image width and height, the number of bits per pixel, the number of planes and the

compression type. This block of bytes gives the detailed information about the image to

the application, which is used to display the image on the screen. It also matches the header used internally by windows and has several different variants and all of them contain a word field which specifies the size. Therefore any application can easily know

that which header is used in the image. Table 2.3 [16] shows the different possible compression rates.

15 Table 2.2 Information Header

Offset # Size Purpose 14 4 The size of the header(40 bytes) 18 4 The bitmap width in pixels(signed integer) 22 4 The bitmap height in pixels(signed integer) 26 2 The number of color planes used 28 2 The number of bits per pixel used( 1, 4, 8, 16, 24, 32 ) 30 4 The compression method being used. 34 4 The size of the image 38 4 The horizontal resolution of the image 42 4 The vertical resolution of the image 46 4 the number of colors in the color palette (0 or default 2n) 50 4 The number of colors used or 0, when every color used.

Table 2.3 Compression Information

Value Identified Compression Comments 0 BI_RGB None Most common 1 BI_RLE8 RLE 8 bit/pixel Can be used only with 8 bit/pixel 2 BI_RLE4 RLE 4 bit/pixel Can be used only with 4 bit/pixel 3 BI_BITFIELDS Bit field Can be used only with 16 or 32 bit/ 4 BI_JPEG JPEG The bitmap contains a jpeg image 5 BI_PNG PNG The bitmap contains a PNG image

2.5.3 Optional Palette

Optional palette occurs in the BMP file directly after the BMP header. It is a

block of bytes listing the colors available for use in a particular indexed-color image.

Each pixel in the image is described by the number of bits 1, 4 or 8 which index a single

color in this table. The purpose of the color palette in indexed-color bitmaps is to tell the application the actual color that each of these index values corresponds to. A color is

16 defined using the 3 values for R, G and B. This color palette is not used if the bitmap is

16 bit or higher otherwise there are no palette bytes in those BMP files.

2.5.4 Image Data

Image data block of bytes describes the image, pixel by pixel. Pixels data are stored upside down with respect to normal raster scan order starting in the lower left

corner, going from left to right, and then row by row from the bottom to the top of the

image.

2.6 Alpha Channel

In , alpha is the technique of mixing an image with

a background to create the appearance of the partial transparency. This process is useful

to render images in separate passes and then combine them into a final image. To

combine these elements correctly, it is important to keep (contains the coverage

information like the shape of the geometry). To store this information, the alpha channel

was introduced by A R.Smith in 1970, s [17]. In 2D images, pixel stores the color value.

For transparent images, it stores an extra value called the alpha value. A value of 0 means that the pixel has no coverage information and is fully transparent. And a value 1 or 255 means the image is opaque.

17 2.6.1 RGBA Color Space

RGBA stands for Red Green Blue Alpha. It extends the RGB color model with the alpha value. Alpha channel was invented in 1971 by Catmull and Smith [18] after the

Greek letter in the classic linear interpolation αA + (1-α) B. This channel is used as an opacity channel. The A value varies from 0 to 255, in which 0 means completely transparent while 255 means opaque. PNG images follow the RGBA color model.

In our proposed technique, RGB images serve as cover images. However, not all RGB images contain an alpha value. If RGB images are used to hide the information, it can lead to suspicion because the default value of the alpha in the RGB images is 255. In

RGBA images alpha value is not same in all the pixels of the image. Therefore the proposed technique gives much better results if RGBA images are used. The figures 2.3

[25] and 2.4 [26] are the example transparent images.

18

Figure 2.3 Desktop Transparent Image

19

Figure 2.4 Composite Transparent Image

2.7 Existing Techniques on Hiding Large Amount of Data

S-tools is a steganography tool available online used to hide data in pictures, sound, etc... It is based on the LSB technique. This tool can hide only 12% of the cover image data. It uses only three bits of each pixel to hide the data but this has very good perception on the cover image.

In 2003, Yeshwanth Srinivasan [4] proposed a spatial domain technique called bit plane complexity segmentation (BPCS) steganography which hides large amount of data.

This technique is based on the simple idea that the higher bit planes could be used for hiding information provided they are hidden in seemingly complex regions.

In 2005, Yeuan-Kuen Lee and Ling-Hwei Chen [5] proposed a steganographic technique based on LSB technique to hide large amount of data. In this technique they have used 4 20 LSB bits of the pixel to hide the secret data. Therefore they could hide about 50% of the

cover image data. The hiding algorithm of this technique is based on the contrast and

luminance property. They used three components to maximize the hiding capacity,

minimizing the embedding error and eliminate the false contours. This advantage of this

technique is that it maintains good perception in the cover image.

In 2005, Seppanen Makela and Keskinarkaus [6] have proposed a high capacity

steganography technique to hide information in the color image. The hiding algorithm

based on this technique could hide large amount of data and lower the level of noise.

They used about 6 bits per pixel to hide the data; therefore they could hide more data compared to previous techniques.

In 2007, Nameer [7] proposed a technique of hiding a large amount of data with high security using steganographic algorithm. In this technique, they tried to improve the efficiency of the payload. They have used adaptive image filtering and adaptive image segmentation with bits replacement on the appropriate pixels. Those pixels are selected randomly by using a new concept, defined by main cases with their sub cases for each byte in one pixel which is based on visual and statistical. High security is provided to the secret message. Using this algorithm they could hide about 75% of the cover image size with high quality of the output and used remaining 25% of the data for the security.

21 CHAPTER III

DESIGN AND IMPLEMENTATION

3.1 Design

A good approach to image steganography should aim at concealing the highest amount of data possible in a cover image while maintaining imperceptibility, that is, an acceptable level of visual quality for the stego-image. The least significant bit scheme is one of the simplest and easily applicable data hiding methods, which directly embeds bits of secret data in the least significant bits of each image pixel. Variations of this technique rely on optimally replacing carefully chosen pixel bits with message bits to improve the image quality and to provide larger hiding capacity.

Images are the most widespread carrier medium used as they can offer high hiding capacity. However, altering the least significant bits of 8-bit images would produce poor quality stego-images. Therefore, 24-bit RGB color images are preferable as their color values can be directly modified without noticeable degradation in image quality. In order to provide for potentially more hiding space, it is noted that a 24-bit pixel value can be stored in 32 bits. With the emergence of sophisticated and available graphics editors nowadays, it is possible to create 32-bit images from 24-bit images, with

22 the extra 8 bits, called alpha channel, specifying transparency. The resulting image is

referred to as an RGBA (RGB + Alpha channel) image. Applying the LSB substitution

scheme to RGBA images will increase the embedding capacity since the 8-bit alpha

channel can be utilized as well.

Another way to provide high hiding capacity is to reduce the size of the secret color image before embedding, which can be achieved by compression, quantization or other transformation functions. Our proposed technique relies on a transformation function based on the YIQ color model, which is the US standard where Y provides the intensity which is the signal used for black and white TVs and I and Q encode chromaticity. The objectives of the YIQ system were to provide a signal that could be

directly displayed by black and white TVs, and at the same time provide easy coding and

decoding of RGB signals. The Y component conveys the luminance information and is

transmitted on a separate carrier signal from the chromaticity components. It can be

computed as a linear function of the RGB values as follows:

(3.1) where , and are the three red, green and blue pixel values, respectively.

In the RGB color space, each color can be represented as a 3-tuple vector

24 = BGRS )I,I,(II . Color quantization can be used to reduce the 2 possible colors by

approximating the original pixels with their nearest color representative in a 256-entry

color palette. However, the quality of the image is greatly affected by how the construction of the palette can accurately represent the possible colors in an image. In our

approach, the transformation or encoding function must efficiently reduce the size of the

image to be stored and at the same time provide an accurate representation of the original 23 image when retrieved and decoded. The process mainly consists in producing new

smaller RGB values, which can be obtained from the terms in equation 3.1 and then

embedded in the least significant bits of each pixel in the RGBA cover image.

First, each term of equation 3.1 is extracted and divided by a reduction factor F, in

order to obtain new smaller RGB values for the secret image pixels. It is very important

that new RGB values are very small so as to use a minimal number of LSB bits in the

cover image. The factors F1, F2 and F3 applied to each color pixel value, are determined in such a way that the new RGB values are hidden in the cover image without any change in

visual perception. They were obtained heuristically through conducting thorough

experiments. Experimental results show that F1, F2 and F3 can all be set to 5 without

significant degradation in visual perception. Therefore, the three new RGB values of the

secret image are obtained as follows:

(3.2)

Figure 3.1 summarizes our LSB-based approach and the procedure to embed an RGB

image into an RGBA image.

24 Cover Color Image (RGBA)

LSB -based Stego-Image Embedding Algorithm

Secret Color Image Y(r,g,b) (RGB) Encoding

Figure 3.1 Proposed Hiding Procedure

3.2 Algorithm and Implementation

The embedding algorithm takes a cover image and a secret image as the inputs.

The size of the secret image can be as large as the cover image. The new RGB values are then computed and hidden in the LSB bits of the cover image using LSB technique.

3.2.1 Data Hiding Algorithm

Step 1: First, take a cover image of size M*N and a secret image of size up to M*N as the inputs.

25 Step 2: Take the RGB values from each pixel of the secret image and transform them into new RGB values according to the encoding function. In order to determine the numbers of bits needed to store the encoded RGB values, we need to compute the maximum value for the vector Is as shown below:

Then divide the above terms by factors F1, F2 and F3.

Therefore, the maximum values for the encoded RGB secret image are:

Now since is 15, 4 bits are needed to embed the encoded red pixel of the secret data image. Similarly, is 30 and is 15 , therefore 5 bits and 4 bits are needed to embed the green and blue pixel, respectively. So the 24-bit color pixel can be effectively represented as a 13-bit pixel.

Step3: Finally, , and values need to be embedded into the cover image.

The LSB substitution technique is applied. Each pixel of the cover image uses 32 bits,

26 including the 8-bit alpha channel. The amount of lower bits used for hiding the 13-bit pixel from the secret image is distributed among the R, G, B and A components of the cover image as follows:

Alpha Red Green Blue

1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1

This method uses 3 lower bits from alpha, red, green values and 4 bits for blue values.

However, value of the secret image requires 4 bits, therefore it uses 3 LSB bits of the alpha value and one bit of the red value of the cover image.

Alpha Red

1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1

Gnew value of the secret image requires 5 bits, therefore it uses 2 lower bits of the red value and 3 lower bits of the green value of the cover image.

Red Green

1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1

value of the secret image requires 4 bits to represent, therefore it uses 4 lower

bits of the blue value of the cover image.

Blue

1 1 1 1 1 1 1 1

The pseudocode for the entire embedding procedure is shown in figure 3.2.

27 Pseudo Code

//read pixel data from the image

for each(i=0;i<=image width){ for each(j=0;j<=image height) { //hide red pixel value of secret image in cover image new red value of secret image= Math.Round((0.299 * pixel1.R) / 5); convert this red value in to binary bits hide these bits into the LSB of RGBA cover image

//hide green pixel value of secret image in cover image new green value of the cover image=Math.Round((0.587 * pixel1.G) / 5); convert this green value into binary bits. hide these bits into the LSB of RGBA cover image

//hide blue pixel value of secret image in cover image new blue value of secret image=Math.Round((0.299 * pixel1.B) / 5); convert this blue value into binary bits hide these bits into the LSB of RGBA cover image } }

Figure 3.2 Pseudo Code for the Proposed Embedding Algorithm

3.2.2 Extraction Algorithm

The process to extract and decode the secret image from the stego-image is shown in figure 3.3. It basically follows the reverse process of the hiding algorithm to obtain the secret image. First, the encoded RGB are retrieved from the least significant bits of the stego-image pixels. The original RGB values of the secret image are obtained using the following equations:

28

Stego Image Extract LSB bits from the stego image

Apply inverse function to Secret Image the extract (r,g,b) values

Figure 3.3 Image Extraction System

Step1: Take the stego image and extract the corresponding number of lower bits in the alpha, red, green and blue values.

Step2: To extract the red value of the secret image, concatenate the 3 lower bits of the alpha value with one upper bit of the red value.

Alpha Red

1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1

29 After obtaining the 4 bits, convert them to an integer value Rcs. Multiply Rcs by the factor f1, i.e. 5, then divide the resultant value by 0.299 to obtain the red value of the secret image.

Red value of the secret image=

Similarly, to retrieve the green value of the secret image, first concatenate 2 lower bits of the red value with the 3 lower bits of the green value.

Red Green

1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1

After obtaining the 5 bits, convert them to an integer value Gcs, which is then multiplied by factor f2 , then divide the resultant value by 0.587.

Green value of the image =

The blue value of the secret of the image can be obtained by extracting the 4 lower bits of the blue pixel.

Blue

1 1 1 1 1 1 1 1

After obtaining those 4 bits, convert them into an integer value Bcs, which is multiplied by factor f3, then divide the resultant value by 0.299.

Blue value of the image =

Finally set these red, green and blue values for each pixel to obtain the secret image.

30 CHAPTER IV

METRICS, RESULTS AND DISCUSSIONS

4.1 Performance Metrics

A series of experiments was conducted to show the effectiveness of the proposed technique. The efficiency of the proposed technique is measured by four metrics which are

1. Amount of secret data hidden in the cover image.

2. Euclidean distance of the cover and stego images.

3. Brightness information of the cover and stego images.

4. PSNR (Peak Signal-to-Noise Ratio) of cover and stego images.

The objective of measuring the amount of data hidden in the cover image is to show that the proposed technique can hide a large amount of data, which will be compared to results obtained using S-Tools[19] technique..

The Euclidean distance of the cover and stego images is used to show that there is not much change in the color perception of the two images. Euclidean norm is the measure used to find the closest palette color of the cover and stego images for palette based images. The objective of this measure is to see that pixel value of the cover image with a

31 particular palette color is not changed to a completely new palette color. If the

Euclidean distance is very high then there is a possibility that pixels have been changed

to new palette colors. The Euclidean norm formula shown in (4.1) is derived from the

Euclidean distance formula for palette based images and applied to the RGB images to

quantify the degree of change in the color between the cover and stego images. It

calculates the distance between color pixels of the cover and stego images [7].

(4.1)

The brightness information of the cover and stego images is used to compare the

perception of the cover and the stego images in terms of intensity. It is computed using

equation 4.2. This measure is used to characterize robustness against visual attack on the

stego image [12], that is, if there is any change in the quality of the images such that it is

difficult for the attacker to notice the presence of secret information hidden in the cover

image.

Brightness = (4.2)

The PSNR (peak signal to noise ratio) is used to measure the quality of stego image compared to the cover image. It is calculated using equation 4.3, where MSE defined in 4.4 refers to mean square error.

(4.3)

32 (4.4)

The quality of the image is higher if the PSNR value of the image is high. Since PSNR is inversely proportional to MSE value of the image, the higher the PSNR value is, the lower the

MSE value will be. Therefore the better the stego image quality is the lower the MSE value will be.

4.2 Experimental Setup

The simulation for the experiment was set up and run on a WindowsXP

Professional on 1.8 GHz with 1 GB of RAM. In the experiments we have used 82 images

to test the proposed technique. Table 4.1 shows the classification of the images. Figures

4.1 to 4.5 show five standard images used in our experiments [20-24]. Euclidean distance

and brightness information value are expressed as average measurements of all images.

Table 4.1 Dataset of Images Used in experiments

Image Types of images Size(M*N) #images Characteristics JPEG Small size images <(100*100) 12 Compressed Images, Meduim size (100*100) –(400*400) 12 consists of variable images colored palette Large images >(400*400) (12+5) images, densely colored palette Images and maps. BMP Small size images <(100*100) 12 Uncompressed Medium size (100*100) –(400*400) 12 Images, consists of images variable colored Large size images >(400*400) (12+5) palette images, densely colored palette Images and maps. 33

Figure 4.1 Blue Hills

Figure 4.2 Sunset

34

Figure 4.3 Flower

35

Figure 4.4 Lena

36

Figure 4.5 Map

4.3 Amount of Secret Data Hidden in Cover Images

Table 4.2 shows the amount of the data hidden in the standard images using the proposed technique and S-Tools algorithm. Figure 4.6 displays two bars, representing results obtained with S- Tools technique compared with the proposed technique. The x- axis in the graph represents the cover images while y-axis represents the amount of data

37 hidden in the image. The proposed technique has the capability of embedding up to the

same amount of secret data as the cover image. In fact, experimental testing includes using the secret image as the cover image to show exactly 100% of data hiding. S-Tools

technique uses only one bit per color pixel to embed data, therefore can only hold 12% of

payload.

Table 4.2 Amount of Data Hidden in Stego Images

Images Size of the % of Data hidden in % of Data hidden in Image (Bytes) S-Tools technique the proposed technique Blue Hills 562554 12.4 100 Sunset 230454 12.4 100 Flower 737334 12.4 100 Lena 786486 12.4 100 Map 709854 10.4 100

Figure 4.6 Amount of Data Hidden in Cover Images

38 4.4 Euclidean Distance of the Cover and Stego Images

Figure 4.7 shows the Euclidean norm of the cover and stego images. The flower image has the highest Euclidean norm because it has large amount of uniform palette color, while the Lena image has lowest Euclidean norm because it has small amount of palette color compared to the other images. Medium size images have the highest values, which are given as the average of 12 medium-size images. Similarly, average of small images with size ranging from 20*20 to 99*99 and large images of size greater than

400*400 are given. The large size images hiding 100% of data show low Euclidean values, which means that this technique is much better at hiding large amount of data.

One reason for the high values for small- and medium-size images is that they do not utilize all the pixels of the cover image.

39

Figure 4.7 Euclidean Distance of Cover Image and Secret Images

4.5 Brightness Information of the Cover and Stego Images

Figure 4.8 displays four bars, referring to the cover image shown in figure 4.9 and

the stego-image with 100 % data, medium-size images and small-size images embedded

shown in figure 4.10. Since all the bars are almost the same, it illustrates that there is very small change in the quality of the cover and stego images. Therefore there are fewer possibilities for visual attack on the stego image.

40 Figure 4.8 Brightness Information of Cover and Stego Images

Figure 4.9 Cover Image

41

Figure 4.10 Stego Image

4.6 PSNR of the Cover and Stego Images

The fig 4.11 shows the PSNR values of the cover and stego images obtained from the S-Tools technique and the proposed technique. The blue bar refers to the PSNR values of cover and stego images obtained from STools technique while the red bar refers to the PSNR values of cover and stego images obtained from the proposed technique and the green bar refers to the PSNR values of secret image and the retrieved secret image.

This figure shows the degradation in image quality relative to results using S-Tools. This was expected as the proposed technique modifies more LSB bits compared S-Tools

42 technique. However, the results show that the changes in the stego image cannot be perceptible by the human eye.

4.7 Disadvantages of the Proposed Technique

The retrieved image secret image has incurred some negligible loss of data when the original secret image was transformed. This is due to the rounding off the pixel values i.e. like rounding off from 0.5 to 1.0 or 0.1 to 0.

Approximate change in pixel values of the secret image before hiding and after retrieving is as follows:

Red pixel value (Rs-8) < Rs < (Rs+8)

Green pixel value (Gs-5) < Gs < (Gs+5)

Blue pixel value (Bs-8) < Bs < (Bs+8)

Figures 4.15 shows the Euclidean distance of the original secret image (Figure

4.13) and retrieved secret image (Figure 4.14). This graph demonstrates that there is negligible amount of data loss in the image. Fig 4.16 shows the brightness information of the original and secret image, which demonstrates that there is not much change in the perception of the secret image obtained after it is retrieved from the stego image.

43

Figure 4.11 Secret Image

44

Figure 4.12 Retrieved Secret Image

45

Figure 4.13 Euclidean of Original Secret Image and Retrieved Secret Image

Figure 4.14 Brightness Information of Original Secret Image and Retrieved Secret Image

46

Figure 4.15 PSNR of the Cover and Stego Images

Table 4.3 PSNR Table of the Cover and Stego Images

S-Tools ProposedTechnique Proposed Technique (Cover Image ) (Secret Image) 67.1 36.5 36.2 63.2 35.8 36 68.1 34.5 35.5 68.5 36 36.1 67 36.1 36.2

47 CHAPTER V

CONCLUSION AND FUTURE WORK

5.1 Conclusion

In this work, we have presented a technique that allows hiding a color image

(secret object) in another color image (cover object), where both images might be of same size, therefore achieving up to 100% payload. It is based on one of the popular and simple Least Significant Bit substitution techniques. It was extended to take into account the alpha channel in RGBA images, which are used as cover images. In addition, a transformation function based on the conversion from RGB color space into YIQ color space was used to reduce the size of the secret image before embedding. Combining those techniques allows us to satisfy our initial objectives of providing a way to embed a large amount of secret data while maintaining imperceptibility.

With continued research and improvement in algorithm design, steganography can be taken as a serious means to hide data and the present work appears that it was more efficient in hiding more data (payload) than the algorithm used in S- Tools [3]. We performed four types of comparison; the first one was used to compare the present algorithm with S-Tools algorithm through the amount of data that can be hidden. The

48 second and third comparison were made upon the statistical attack; it shows that it is difficult to distinguish between cover object and the stego object as found by computing the Euclidian distance and the brightness information. Finally, the last comparison used the PSNR value, which indicates that changes in the stego-image cannot be perceptible by the human eye.

5.2 Future Work

In the proposed technique, the results show that there is some loss of data in the retrieval of the secret image (secret object). The loss is due to the rounding off from 0.5 to 1.0 or 0.1 to 0.5 during the embedding algorithm. However, the image can afford to lose some data and still retain good quality. This is the main reason that we could hide large amount of data and achieve up to about 100% payload.

The future work that can be pursued on this work include the design of the algorithm where the there is no loss of data in the secret image and provide security to the secret image.

49 REFERENCES

1. Fabien A. P. Petitcolas, Ross J. Anderson and Markus G. Kuhn,”Information hiding- Survey”, IEEE, 87(7):1062-1078, 1999.

2. G. J. Simmons, “The Prisoner’s problem and the subliminal channel in Advances in Cryptology”, Proc.crypto ’83:55–67, 1983.

3. Udit Budddia and Deepak Kundur, “Digital video steganalysis exploiting collusion Sensitivity”,IEEE, 1(4):502-516, 2006.

4. Furuta, T,.Noda, H., Niimi, M., Kawaguchi E,”Bit-plane decomposition steganography using wavelet compressed video”, Joint Conference of the Fourth International Conference, 2(5): 970 - 974, 2003.

5. V.Karthekayani and kammalakan, “Conversion grayscale image to color image with and without texture synthesis”, International journal of computer science and network security, 7(4):11-16, 2007.

6. Eiji Kawaguchi and Richard O. Eason, “Principle and applications of BPCS- Steganography”, Proc. SPIE ,3528: 464-473 , 1999.

7. Nameer N. EL-Emam, “Hiding a Large Amount of Data with High Security Using Steganography Algorithm” Jordan Journal of Science publications, 3 (4): 223-232, 2007.

8. K B Raja, C R Chowdary K R Venugopal, “A Secure Image Steganography using LSB, DCT and Compression Techniques on Images”, IEEE, 170-176, 2005.

9. Naofumi,” Technique of lossless steganography”, IEICE Transactions on Communications, 90(11):1-4, 2007.

10. Nan jiang and wan jiang, “Random oracle model of information modeling” , World academy of science, 18:1307-6884, 2006

11. Steganography [tG. Pulcini, \Stegotif,"http://www.geocities.com/SiliconValley/ 9210/gfree. Html, 10/28/2008]

50 12. Vishal,Wilson and bryon, “Linear,color separablehuman visual system model for vector diffusioning system”, Journal of Electronic Imaging, 1:277-292, 1992.

13. Ying Wang; Moulin, P,”Statistical Signal Processing” , IEEE, 56(11) : 339 – 342, 2003.

14. Mastronardi, G.; Castellano, M. Marino, “Intelligent Data Acquisition and Advanced Computing Systems” IEEE,11:116 – 119, 2003.

15. M. Shirali-Shahreza and M.H. Shirali-Shahreza, “An Improved Version of Persian/Arabic Text Steganography Using "La” Word’” Proceedings of the 6th National Conference on Telecommunication Technologies 2008 (NCTT 2008), Putrajaya, Malaysia, August 26–28, 2008.

16. Bmp format [http://local.wasp.uwa.edu.au/~pbourke/dataformats/bmp/, 10/28/2008]

17. Porter, Thomas; Tom Duff, “Compositing Digital Images”,.Computer Graphics 18 (3): 253–259, 1984.

18. Alvy Ray Smith, “Alpha and the History of ”, Microsoft Tech Memo, 7:08-15, 2005.

19. S-tools[http://www.spychecker.com/program/stools.html, 10/28/2008]

20. Bluehills [http://www.porathcontractors.com/gallery/albums/projects/ Blue_hills.sized.jpg, 10/28/2008]

21. Sunset [http://pics.ww.com/d/199-2/Sunset_bmp.jpeg, 10/28/2008]

22. Flower [http://www.hlevkin.com/TestImages/flower.bmp, 10/28/2008]

23. Lena [http://www.bilsen.com/aic/tests/lena/lena.bmp, 10/28/2008]

24. Map [http://www.blindsociety.com/blindspot/wp-ontent/uploads/2007/08/zip- code-map.bmp, 10/28/2008]

25. Steganography [http://www.cimitan.com/blog/wp-content/murrine_rgba.png],

26. Transparent image[http://dl.ambiweb.de/mirrors/www.libpng.org/pub/png/img_p ng/imgcomp- 440x330.png, 10/28/2008].

51