AN INVESTIGATION OF SCALABLE VECTOR GRAPHICS AS A COVER MEDIUM FOR
STEGANOGRAPHY
By
Gerard J. Hungerman
Submitted to the
Faculty of the College of Arts and Sciences
of American University
in Partial Fulfillment, of
the Requirements for the Degree of
M aster of Science
In
Computer Science
Chair:
Dr. Michael Gray
Dr. Angela Wu ^ i I
Dr. Dean McCullough
Dean of the College of Arts and Sciences
Date
2006
American University
Washington, D.C. 20016 AMERICAN UNIVERSITY LIBRARY
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 1439943
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion.
® UMI
UMI Microform 1439943 Copyright 2007 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. AN INVESTIGATION OF SCALABLE VECTOR GRAPHICS AS A COVER MEDIUM FOR
STEGANOGRAPHY
BY
Gerard J. Hungerman
ABSTRACT
Cryptography is normally used when security is required in Internet communications, even
though its use makes obvious to outside observers that communications are sensitive. If the true
desire is for the very existence of sensitive communication to go undetected, then steganography is
the answer. W ith steganography, data can be hidden in ordinary digital files such th at any unwanted
interceptors of communications are likely to think that all they have are normal messages.
A file format that has promise of becoming widely used on the Internet is Scalable
Vector Graphics (SVG), which supports the delivery of two-dimensional graphical images with
rich content. This work proposes algorithms for embedding information into SVG and implements
three different approaches, each with variations to further explore the nature of the embedding
techniques. Preliminary testing has been conducted to show the viability of SVG as a cover medium
for steganography.
ii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ACKNOWLEDGEMENTS
I would like to acknowledge everyone who helped me to reach this point in my academic
career. I would like to thank the faculty of the American University Computer Science Department,
Dr. Michael Gray and Dr. Angela Wu, for their support and counsel. I would especially like to
thank Dr. Dean McCullough not only for his advice and review of this work, but also for his
understanding and friendship through the process. Most of all, I want to thank anyone who ever
reached out to me when I was in need, whether it was to help at work, home, or to just motivate me
to complete this body of work. Thank you Elise for your love and support through this balancing
act. Last but certainly not least, I love my family beyond words, and cannot thank them enough
for their love and support.
iii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. TABLE OF CONTENTS
ABSTRACT ...... ii
ACKNOWLEDGEMENTS...... iii
LIST OF TABLES ...... vi
LIST OF ILLUSTRATIONS ...... vii
LISTINGS ...... viii
CHAPTER
1. INTRODUCTION ...... 1
2. B A C K G R O U N D ...... 3
2.1 S teganography ...... 3
2.1.1 Principles of Steganography ...... 4
2.1.2 T echniques ...... 7
2.1.3 Steganalysis ...... 9
2.1.4 Attacks...... 11
2.1.5 Security and Robustness ...... 12
2.2 Scalable Vector Graphics ...... 13
2.2.1 Raster Im a g e s ...... 13
2.2.2 Vector Im ages ...... 14
2.2.3 The Scalable Vector Graphics (SVG) Form at ...... 15
3. COVERT CHANNELS IN SV G ...... 22
3.1 Embedding M ethods ...... 23
3.1.1 In s e rtio n ...... 24
3.1.2 Substitution ...... 27
3.1.3 Tweaking ...... 29
iv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.4 Other Embedding Approaches ...... 32
3.2 Im p le m e n ta tio n ...... 32
3.2.1 Inserting W hite Space ...... 32
3.2.2 Adjusting Least Significant Figures ...... 34
3.2.3 Reordering Information ...... 37
3.3 Test Results ...... 40
3.3.1 B an d w id th ...... 41
3.3.2 Visual and Structural Effects ...... 44
3.3.3 Compression Testing ...... 47
3.3.4 Statistical Analysis ...... 48
4. CONCLUSIONS AND FUTURE WORK ...... 51
APPENDIX
A. SVG FILES USED IN TESTING ...... 53
B. SOURCE C O D E ...... 58
B.l Overview ...... 58
B.1.1 Inserting White Space ...... 60
B.1.2 Adjusting Least Significant Figures ...... 60
B.1.3 Reordering Information ...... 61
B.1.4 Combining Methods ...... 62
B.2 Source Code Listings ...... 63
REFERENCES...... 131
v
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF TABLES
Table Page
3.1. Attributes Allowing Adjustment ...... 35
3.2. Algorithms and Tools T e s t e d ...... 41
3.3. Bandwidth in Initial T esting ...... 42
3.4. Bandwidth in Testing of alg_TweakTwoNumsMinErr ...... 43
3.5. Compression Testing Results ...... 47
3.6. Confidence Values that Cover and Stego are Different ...... 49
A.I. SVG Files Used in Testing ...... 53
vi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LIST OF ILLUSTRATIONS
Figure Page
2.1. Hello World SVG Example Im ag e ...... 17
3.1. Plot of RMS vs. B an d w id th ...... 44
3.2. Visual Effects of Tw eaking...... 46
A.I. animated_bustrack.svg After Animation ...... 54
A.2. animated-plane. svg During Animation ...... 55
A.3. gaenseblume. svg During Animation ...... 55
A.4. krokus. svg During Anim ation ...... 56
A.5. V ienna.s v g ...... 56
A.6. w o r ld .p o p u la tio n .s v g ...... 57
vii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. LISTINGS
Listing Page
2.1 Hello World SVG Example Source...... 18
B.l SVGSteg.java...... 63
B.2 StegMethod. j ava ...... 65
B.3 StegAlgorithm.java ...... 66
B.4 InfoGen.java...... 67
B.5 TextManip.java...... 69
B.6 SVGTag.java...... 74
B.7 SVGAttribute.java ...... 78
B.8 TagProcessor.java ...... 79
B.9 LeadTab.java...... 81
B.10 LeadTabSqOff. j a v a ...... 86
B .ll TweakNums. j a v a ...... 92
B.12 TweakTwoNums. j a v a ...... 100
B.13 TweakTwoNumsMinErr. java ...... 102
B.14 OrderEncoder. ja v a ...... 108
B.15 ReOrderElements. j a v a ...... I l l
B.16 OrderedSVGDoc. java ...... 115
B.17 OrderedSVGTag. java ...... 119
B.18 SVGDocElem. java ...... 123
B.19 ReOrderAtts. ja v a...... 124
B.20 AllMethods. java ...... 126
viii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 1
INTRODUCTION
The rise of the Internet has provided a medium of communication where large amounts of
data can be sent around the world faster than ever before. A person from New Zealand, one from
India, and another from the United States can hold a group chat from the comfort of their respective
homes. Even those with limited speech rights have found a potential outlet. Looking to bring part
of the global economy home, countries with oppressive governments have embraced the Internet for
e-commerce, but in turn have not allowed their people full access to World Wide Web [54]. Certain
“restricted” content is blocked from view, and people’s communications are monitored. The fight
to view any Internet content, now to include increasingly popular blogs, continues today. In some
cases, anonymizers and foreign proxy servers do the trick until they are found out and blocked. The
more powerful the government, the more sophisticated the blocking technology [26].
As far as communications monitoring, the immediate answer that comes to mind is to use
cryptography. It has been the answer for ensuring the protection of credit card numbers and other
sensitive information in transit, to include the digital signatures on new eCheck technology [2].
There are some major problems though with using cryptography alone for the purpose of evading
oppressive government monitoring. First and foremost, the technology is typically banned in these
countries, and the United States, not wanting this technology in the hands of its enemies, has strict
export controls on it. Another important point is that even if a person under a monitoring regime
were to get their hands on a cryptographic engine, any messages sent using it would be flagged
as suspicious, giving the government reason to monitor the sender. Even if the sender manages to
accomplish anonymity, the message could still be easily blocked.
Therefore, the requirement is further refined to be secretive communication such that
outside monitoring does not recognize that secretive communications are taking place. For example,
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2
an e-mail detailing a vacation experience to a friend with amateur photography attached draws much
less attention than one containing random cipher text, yet the vacation experience can contain
much more than meets the eye. With steganography, data can be hidden in ordinary digital files
such that any unwanted interceptors of the message are likely to think that all they have is a
normal message, and not spend time trying to crack something that in their mind does not exist.
Hence, cryptography and steganography work well together, the former scrambling a message into
randomness and the latter tucking it safely away where no one, and hopefully no machine, can see
it. With steganography added to the arsenal, a sender can use open communication lines for covert
communication.
Steganography is certainly not a new idea, as examples of its use date back to antiquity.
The Greek Histiaeus, while being held prisoner under the Persian king Darius Hystaspis, was said
to have warned his countrymen of Persia’s intent to conquer them through a secret message [24]. He
did so by shaving the head of a trusted slave, tattooing the message on his head, and then waiting
for his hair to grow back before sending him off. The slave could easily travel to the recipient as he
was carrying no messages, at least in plain view. Many other interesting techniques for hiding the
existence of information have been used since. Today, with the use of the modern computer and the
Internet, methods that use rudimentary mathematics and take advantage of a file type’s storage
format can be employed to hide data in audio, video, image, and even executable files. In [12], the
author claims to have randomly downloaded 500 JPEG images from eBay and found, using his own
analytic methods, that over 150 had secret data embedded in them.
Given the variety of file formats and communication protocols that exist today, many tools
have been developed to hide information in digital content [28], with the vast majority of them
dealing with image files. This is due to their extensive use over the Internet.
A file format that has promise of becoming widely used on the Internet is Scalable Vector
Graphics (SVG). Based on the Extensible Markup Language (XML), SVG supports the delivery
of two-dimensional graphical images with rich content for display on a variety of devices. Given
the ability to create SVG dynamically on a server, the potential for embedding information is
great. This work proposes algorithms for embedding information into SVG and implements three
different approaches, each with variations to further explore the nature of the embedding techniques.
Preliminary testing has been conducted to show the viability of SVG as a cover medium for
steganography.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. CHAPTER 2 BACKGROUND
This chapter provides an overview of both steganography and Scalable Vector Graphics
(SVG) for an understanding of the information hiding techniques presented in the next chapter.
2.1 Steganography
Steganography is a form of information hiding. Information hiding techniques may be used
for numerous reasons, although there are three major concerns being addressed today. The first is
the augmentation of data in a file for the benefit of the intended recipient, the second is the tamper-
proofing of a file to implement copyright restrictions, and the third is the concealment of sensitive
data to be sent across an open medium—as in steganography. Each information hiding technique
addresses one of these concerns. Each concern requires a different level of care, and therefore incurs
a certain level of difficulty in implementation.
The least constrained concern of the three is the augmentation of data in a file for a user’s
benefit. In this case, the file format specification can be designed to hold the information so that
a program that reads the file can understand and make use of it. An example of this can be found
in the digital storage of audio. The MP3 file format stores compressed audio data so that songs,
audio books, or other sounds may be more easily transferred over communication networks. These
files also carry extra information, not necessarily “hidden” per se, but areas where data other than the audio itself is stored. This data—the title of the audio, the composer or artist, if applicable the
album where the song can be found, etc.—augments the accompanying audio for the benefit of the
user of the file. The only real constraints are that the extra data does not significantly affect the
size of the file or the quality of the audio.
3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4
On the other hand, the most constrained concern requiring the application of information
hiding techniques is digital watermarking [13], which is the injection of hidden data into a file to
distinguish it from copies of itself. Since most computer users today have the ability to make digital
copies of music with practically no loss of quality, industry concerns of needing proper copyright
protections have intensified and hence fueled studies into this area. The trick is to have no (or an
undetectable) loss of quality to the file containing the watermark. Additional requirements of a
digital watermark are the need for it to resist its removal and remain intact through conversion,
compression, and transmission. If the watermark can somehow be removed, it should be at the
cost of the file degrading in quality or being rendered useless. This definitely does not lend to the
incorporation of the watermark into the file format specification, since it would more than likely not
be able to resist conversion, and could be easily found and erased or altered without consequence to
the file. Therefore, digital watermarking requires a more creative solution that can depend on the
format of the file (more powerful watermarking techniques do not), but does not include explicitly
adding anything to its storage specification.
The constraints of steganography fall in between those of augmentation and digital
watermarking. Steganography requires the same creativity as digital watermarking, but refers
to the implementation of secretive communication between two or more parties where the fact that
communication is taking place is unknown to anyone else. The idea is, as with digital watermarking,
to have no (or an undetectable) loss of quality to the file containing the secret data. Therefore,
an outside party could access the file and never know that it contained extra information. Since,
unlike a digital watermark, the secret data should not be known to exist, steganography does
not necessarily require the same level of resistance to removal. What it does require however is
resistance to methods that try to detect its use on a file.
2.1.1 Principles of Steganography
Steganography is basically the science of hiding information inside of other information
where only the intended recipient(s) know of its existence and can find it. There are three different
components to it: the information to hide, a cover object, and a stego object. The information to
hide can be anything, although most methods have an upper limit to the amount of information
that can be hidden. Hence, text messages are usually easier to hide than high-resolution images or
video. Some techniques hide only one bit of information: for example a 1 for true or a 0 for false.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5
It may not sound like much, but take for instance the following example. Suppose a secret society
has two places of meeting, and the decision of which to use from month to month is made by the
president the week of the meeting. This is so that he can survey throughout the earlier part of the
month which place will be safest. Since this is a secret society, the president does not want to be
seen communicating this information to any of the members, nor use open lines of communication.
Therefore, he uses a third-party bulletin board or Web site that all of the members would be visiting
anyway, and embeds within a picture on that site either a 0 to meet at one place or a 1 to meet at
the other.
In a steganographic system the cover object is a vehicle for the hidden information. Today,
image files, audio files, and compressed files are all used as cover objects, although the set of
possible cover objects is not limited to these. In the use of the most common steganography tools,
the information to hide is input along with a cover object, and the output is a stego object. A
stego object is simply a cover object that has secret information hidden in it. The ultimate goal
of steganography can then be described as embedding secret information in cover objects such
that given an arbitrary set of cover objects, there is no way to tell which, if any, contain secret
information. As with anything, there are exceptions to this process. In some steganographic
algorithms there is no cover object, where only the information to hide is input to produce the
stego object. These and other more common techniques will be described in the next section.
There are three categories of steganography: pure steganography, secret key steganography,
and public key steganography. Pure steganography requires no prior exchange of information
between the two parties communicating and relies on security through obscurity. This means that
the algorithm is not publicly known, and therefore the level of testing is also unknown, making the
tool unproven. One has to go on faith alone in those involved in the tool’s creation to be assured
covert communication. Numerous instances of the false sense of security through obscurity can be
cited [37]. One particularly interesting item noted in [51] was the Secure Digital Music Initiative
(SDMI) [45]. It was set up by the major players in the music industry who were attempting to create
a robust watermarking solution. They wanted to seek public scrutiny of their algorithms by means of a contest challenging people to break various watermarks. Unfortunately, any successful contestants
had to sign a secrecy agreement to collect their prize, essentially obscuring the weaknesses of
the solution to broader research and comment. Those who refused to sign the agreement and
presented their findings elsewhere [14] were threatened with legal action. Pure steganography
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6
is used in this research to test the viability of SVG as a cover medium, but for an effective tool to
be developed, either secret key or public key steganography should be used.
Secret key steganography usually uses a publicly known algorithm, and relies on a secret
key chosen beforehand by the two parties communicating. This key is needed to both embed and
extract the hidden information, and if the proper key is not used, it cannot be known if data is
actually hidden in a given cover object. If prior secure—or, if desired, covert—communications
cannot be conducted to share the secret key before covert communications, another possibility is
public key steganography [48] [3]. It entails the sender using the recipient’s public key to embed the
information, which can only be detected using the recipient’s private key. This is analogous to how
the public key infrastructure works in cryptography. The interesting characteristic with public key
steganography is th a t even the sender should not be able to detect the secret message in the resulting
stego object. As another alternative, [48] proposes a steganographic key exchange protocol, where
the communicating parties exchange a sequence of messages that look like normal communications,
and at the end of the sequence each party is able to compute a shared key. This shared key can
then be used for secret key steganography. No matter how it is carried out, steganography is not
useful if the existence of secret information can be proven by outside parties.
The final concept that dictates the usefulness of a steganographic algorithm is bandwidth.
Some algorithms can always hide the same amount of information, regardless of the size of the
cover file, while the embedding capacity of others depends upon the size of the cover object. We
define bandwidth as the ratio of the size of the hidden information to that of the cover object.
Suppose with a particular steganographic algorithm we are able to hide 10 kilobytes of information
in a 100 kilobyte file. This would provide a bandwidth of ^ = 10%. Similarly, embedding 25
kilobytes of information in a different 100 kilobyte file would be a bandwidth of 25%. Since some
steganographic algorithm’s embedding capacity can vary by the cover object used, we take an
average of the algorithm’s bandwidth across a number of different cover objects.
Many steganographic algorithms use extra bits for a header and for error correction, which
take up some of the embedding space and therefore reduce the amount available for the information
itself. However, these are functions of the message formation, and in this work we are only looking
at methods of embedding information and not a particular tool. Therefore, we will include all
available space in bandwidth calculations.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7
2.1.2 Techniques
There are numerous techniques today that implement steganography. Both [29] and [51]
provide excellent surveys of the creative ideas in the field. The most common deal with using images
as cover objects. They are usually simpler to understand, and hence provide a good introduction
into this niche of information hiding.
The primary example of a method used to embed secret information in images takes
advantage of the “noise”, or least significant bits, in the image. Substituting data in the least
significant portions of an image should not affect its overall look, making the modifications
that transport the secret data unnoticeable to the casual observer. More advanced methods
use techniques from signal processing such as spread-spectrum communications to scatter secret
information over the entire cover image. This can even take place across different bit-planes in an
image [30]. To provide a basis of comparison with our embedding techniques using SVG as a cover,
one of the tools analyzed in our testing is S-Tools [9]. It hides data in image files by spreading the
bits in a random walk through the least significant bits of the colors in the image. Though not a
part of our analysis, the tool also performs a similar embedding approach with sound files.
Other advanced methods apply a transform to a complex part of the image and act
on things such as the luminescence or the color palette. These techniques take advantage of
the inability of the human eye to recognize slight variations in the colors of an image. Other
forms of image transformation provide opportunity for embedding, to include the popular Joint
Photographic Experts Group (JPEG) image format [31]. These techniques embed secret information
by manipulating the coefficients of the discrete cosine transformation (DCT) that is applied to an
image to compress it. Another steganography technique that takes advantage of a transform uses
MP3 audio files as cover objects [39]. The secret data is embedded during the compression of
the audio, altering the error-correcting coding process and hiding the information in the parity.
As a final transform domain example, GZSteg [8] provides a simple modification to the GZIP
compression algorithm to hide information. Within the tokens GZIP uses to cite repetitive
information throughout a file, GZSteg manipulates the calculated length of the repetitive data.
GZIP has no problem unzipping these stego files.
Understanding the statistical properties of the cover object and its domain (e.g., natural
images) can aid in the effectiveness of hidden information going undetected. Though earlier ideas
included modifying the statistical properties of a cover object in order to embed secret information
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [6], today much of the focus is on maintaining the statistical properties of the cover object [19]
to increase security. The second tool analyzed for comparison with our embedding techniques,
Steghide [25], hides data in image files by taking a graph-theoretic approach and swapping pixels
to embed information while maintaining first-order statistics. Though not a part of our analysis,
the tool also performs a similar embedding approach with sound files.
Textual document files, such as those saved in Microsoft Word, also have interesting promise
as cover objects. Tweaking the spacing between lines, words, or even characters can be used to hide
information. Many of the techniques take advantage of subtle changes in formatting, and some try
to format text so that it is not visible in the user interface of the tool(s) used to view the document.
One older technique, before the days of computers, put tiny dots above relevant letters to encode
a secret message. The receiver required a magnifying glass to read the secret message. Other older
approaches involved invisible inks that required heat, water, or exposure to special light to become
visible. In digital files, white space can be “invisible ink” for a binary message. The third and final
tool analyzed for comparison with our embedding techniques, Snow [33], conceals messages in text
files by appending tabs and spaces on the end of lines.
In steganography there are also generation techniques that take a secret message and
produce a stego text from it without the need for a cover object. Not surprisingly, they usually
require user interaction to make sure that something nonsensical is not produced. One example of
this is a steganography technique that produces a text file using a context-free grammar, where the
specific productions that are used dictate the secret message [16] [50]. Another instance of this type
of tool takes a secret message and produces from it the recap of a baseball game, detailed pitch by
pitch. Similarly, a tool could generate an image using fractals, with properties set by the secret
data to embed [15].
Scalable Vector Graphics (SVG) files are stored as Unicode text that is not formatted, so
the techniques presented that manipulate formats in textual files would not work. This moves the
focus to the aspects of how Scalable Vector Graphics files describe the text, graphics, and animation
that are rendered. The next chapter will cover the techniques reviewed for embedding information
into SVG files.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9
2.1.3 Steganalysis
In order to have any confidence that hidden information is actually safe from detection,
proper scrutiny of the algorithm used is required. This includes, but is not limited to, having
experts—aside from the steganography algorithm’s creators—work to detect any characteristics in
stego objects produced by the algorithm that point to steganographic modifications. Steganalysis is
the science of discovering, decoding, and/or destroying messages hidden in a cover file. Just being
able to find out that messages are being hidden is a feat. Efforts have been made to detect the
signature of some of the more popular steganography algorithms, but automated scanning tools are
not mature. Here we cite some of the im portant work in the field.
Looking at algorithms that hide information in the least significant bits of images, [20] have
found a spatial correlation in typical cover images where the least significant bit (LSB) plane can be
predicted to some degree based upon the other more significant bit-planes in the image. This allows
for the detection of subtle modifications amongst the least significant bits of an image. We can also
look at how an image is quantized using JPEG compression to detect the use of steganography in
the spatial domain [21]. Algorithms that look for (or create) areas of similarity in order to exploit
slight differences to hide information—such as manipulating the color table of a GIF image—are
also susceptible to statistical analysis [52]. Reviewing pairs of values in the histogram of color
frequencies within the color table of a GIF can show the use of this type of steganography.
As with Steghide, [41] developed an approach to hide information while maintaining first-
order statistics. But in using the novel approach of embedding the maximum possible amount of
information again in the suspect image using the same algorithm, [22] were able to gather a measure
to detect this use of steganography. In measuring the difference in spatial discontinuities at the
boundaries of all 8x8 blocks in the suspect image before and after the test embed, it measured
significantly less if information was already embedded in the suspect image than if it previously
contained no hidden information. The idea is that any previously embedded information already
had an effect on this test statistic, which therefore also allows for an estimate as to the length of
the hidden m essage.
Research has also been done into tools that focus not on specific steganography algorithms,
but on patterns that normally occur in digital images [53], so that potential stego images can
be identified. These are termed blind detection techniques, as they do not have prior knowledge
of any particular algorithm, and could be used to detect a range of steganography techniques. In
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 10
taking a general mathematical approach, [11] have been able to detect and extract information from
numerous techniques, including those using spread-spectrum steganography. One of the most cited
blind detection approaches uses a wavelet decomposition of images [17] [35]. This method, which
uses higher-order models and counters embedding methods that fix first-order statistics, requires
training on many cover and stego images in order to properly work.
Looking beyond the use of any given algorithm, there are some simple things that can be
done to discover the use of steganography. A lot of useful information can be extrapolated from
the patterns of e-mail users. If the same large image is consistently being sent between a pair of
users, this can signal the use of steganography. This is especially true if the image does not seem
to change, but the date/time stamp on the file does. Another giveaway, which some steganography
algorithms will alter (but should not), is the size of the cover file. Even if the size of the cover file
does not change, someone could take a cover object and its corresponding stego object (or even
two stego objects that use the same cover), compress them, and measure if there is a considerable
difference between the compression ratio on the two objects. Minor differences may be explained
by transmission errors or other subtle anomalies, but major differences can point to information
hiding.
To some extent this sort of analysis can be countered by using a different cover object
each time or through the posting of stego objects to Web sites instead of attempting direct secret
communication. But any abnormal behavior can throw up red flags. Taking our example of a secret
society in Section 2.1.1, if the members only visit the Web site or bulletin board containing the
stego image the week of the meeting and only then, the pattern of behavior could be detected. The
main point to be made is that the use of a reasonably undetectable steganography algorithm alone
does not ensure undetectable communications.
In taking a broader approach to find any use of steganography on the Internet, [42] wrote a
tool to crawl the USENET archives and eBay using a depth-first search. Every JPEG image found
was processed using some of the statistical steganalysis techniques previously presented. Testing
conducted before the search proved the ability to detect use of three popular steganography tools
that use JPEG images as cover objects—JPHide[34], JSteg[31], and Outguess[40]. Their search of
two million images provided just over 3% as positives, but the payload in none of these images could
be extracted using a dictionary attack, leading the authors to believe them false positives. This
contradicts the claims of [12] cited in the introduction, and continues the debate of the extent of
real-world use of steganography. Given that the USENET archives and eBay, though large, do not
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11
encompass the entire Internet, a different approach has been proposed to only search information
that is requested [7]. By monitoring HTTP traffic, sites that are not otherwise linked to can be
scanned, but dealing with the breadth of information gathered then becomes a challenge.
As with cryptography, in steganography there is a constant battle between those hiding and
those attempting to detect and extract. With steganalysis there is another option: to destroy any
possible hidden message rather than search for its existence. As far as images go, image processing
techniques such as smoothing, compressing, or transforming usually destroy any hidden data. These
are referred to as active attacks on a steganographic system, which are covered in the next section.
2.1.4 Attacks
Attacks are considered actions that run counter to what the steganographic system is
trying to do: hide the existence of secret data in a cover object. There are three kinds of attacks
on steganographic systems: passive, active, and malicious. The study of these is a major part of
steganalysis, with each approach having different goals.
Passive attacks consist of simply looking for the existence of hidden messages. As covered
in the previous section, statistical analysis can be done to detect modifications made by some
steganographic systems. This can include not only detecting the presence of hidden data, but
also attempting to gather as much information as possible about the embedding. Specifically, the
steganography algorithm used, any parameters of the algorithm, where the information is hidden in
the cover object, the estimated length of the message, and the key used are all sought. Overall, these
type of attacks seek to gain as much information as possible, with the ultimate being extraction of
any hidden messages. In this work we conduct passive attacks aimed at detection of the embedding
techniques devised to assess Scalable Vector Graphics files as a viable cover medium.
Active attacks consist of making changes to a cover object between the sender and
recipient (s) of a message. Steganographic systems are usually sensitive to cover modifications.
When images are used as cover objects, they are susceptible to image processing techniques such as
smoothing, filtering, and transformation to alternative formats. Active attacks are not concerned
with finding hidden data, rather they focus on destroying any potential hidden information in
a cover object while maintaining an acceptable level of fidelity in the cover itself. Combating
steganography in network protocols, [18] defines “minimal requisite fidelity” as the threshold where
overt communications can take place but no other information is preserved. Unused bits are reset,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 12
IDs are manipulated where possible, and any detected semantic anomalies (e.g., unknown source
and destination ports in a TCP packet) are adjusted. An example of an effective active attack
on a steganographic system that uses the least significant bits in an image file would be to just
randomize all of them. As with the LSB embedding technique, the attack would not harm the cover
object since these bits contribute the least to the image display.
The third and final type of attack is the malicious attack. This is when someone forges
a stego object or initiates communication with another using a steganography tool under a false
identity. If secret key or public key steganography are not used, then this type of attack can be a
real problem. Without distinguishing information such as a secret key, it is hard for the recipient
of a stego object to verify the identity of the sender, especially if it is something downloaded from
a common Website.
2.1.5 Security and Robustness
Security and robustness are two important aspects of any steganographic system. The
security of a steganographic system is defined as the difficulty in detecting data hidden within a
stego object produced by the system. In the next chapter we define a set of visual and statistical
tests that will be used to compare different embedding method’s resistance to detection.
In addition to the detectability of the use of a particular embedding method, when a
complete steganographic system is being developed there are actually four other requirements that
should be implemented to make the system more secure. First, secret data should be encrypted and
embedded using a public algorithm and secret key. This way, a user can understand the limits of the
system, and still keep their data safe. The second requirement is that only the holder of the correct
key can detect, extract, and hence prove the existence of any hidden message in a cover object.
Outside parties should not even be able to find statistical evidence that hidden data exists. Third,
even if the contents of one piece of hidden data is known, there should be no chance of detecting
others. And finally, it should be computationally infeasible to detect hidden information. Just as with cryptography, any method that could break the system should not be worth the computing
power needed.
A steganographic system is considered robust if the embedded information cannot be altered
without making drastic changes to the stego object. This is difficult to do, and steganographic
systems can usually only focus on being robust against a very specific set of mappings or attacks
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13
(e.g., conversions to a certain format). If desired, the message itself can be more robust by having
error correcting codes inserted under encryption. Redundancy can also be employed to help ensure
the survival of hidden data, or it can actually be placed in perceptually more significant parts of
the image, using embedding rules that operate within some transform domain. It is important
to point out though that robustness is not enough against a malicious attacker, and there is a
tradeoff between robustness and security. The more robust a steganographic system is made, the
more structured the information becomes, making it easier to detect through statistical passive
attacks. Also with an increase in robustness comes the need for more bits, hence decreasing the
useful bandwidth of the system.
2.2 Scalable Vector Graphics
2.2.1 Raster Images
The most common graphics on the Internet are considered raster images because they
are defined as a matrix of pixels, each with its own defined color. The quality of the these raster
images is expressed in dots per inch (dpi) where the higher the dpi, the better the image quality. Not
surprisingly, another side effect that comes with an increase in quality is an increase in the resulting
image’s file size. Bitmap files follow this format precisely, defining the location of each pixel and
its color, but the most common files of these type found on the Internet are Graphics Interchange
Format (GIF) and Joint Photographic Experts Group (JPEG) images. For efficient delivery of
content over the Internet, a balance must be struck between quality and file size. The reason why
GIF and JPEG images are so popular on the Internet is because they each take approaches to
reduce the size of the file for quicker transmission while maintaining a level of quality higher than
a normal bitmap image of equivalent file size.
One weakness of the raster formats though comes in magnification. This may occur if the
image is of a map and a closer view of a section of streets is desired, or if an image needs to be
clipped and blown up to the size of the original image. As raster images are blown up, smooth lines will become jagged, and the image will be seen for what it truly is, a set of blocks of color.
The higher the level of zoom, the bigger the blocks. How quickly a raster image falls into this state
depends on the quality in which the image is stored. Therefore, the compromise between quality
and file size not only has to address transmission speed, but also potential magnification. One
approach to this problem, currently used by MapQuest [36], is to have multiple files of the same
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 14
image area—a map in their case—stored at different magnifications. This allows them to provide a
small number of magnifications, and the time it takes for a new magnification to be shown becomes
dependent upon transmission speed.
Also with raster images, a resolution problem comes up when we look at the various devices
upon which Internet content is being delivered. Most common are personal computers and laptops
with monitors that provide multiple resolution settings. Depending on the viewable size of the
monitor, typically a resolution is set so that text can be easily read. Images must be sized to the
lowest common monitor resolution in order for them to be seen in their entirety on a Web page. This
type of sizing deals with the number of pixels the image uses on the content delivery system (e.g.,
the monitor) and, though related to file size, is not the same thing. Standard monitor resolutions
include 640x480, 800x600, 1024x768, and 1280x1024, all measured in pixels. Newer models, to
include wide screen displays, offer even more options. Many sites on the Internet assume monitors
to be set at 800x600 and are hence best viewed at that resolution. This means that images on these
sites will potentially not fit on monitors with a smaller resolution, wasting the transmission time
of sending a larger image. Also, for those with a larger resolution monitor who want to magnify
images—maybe to stretch them for a screen wallpaper background—quality becomes a concern.
This headache of trying to design to encompass most monitor resolutions is magnified today by the
addition of various handheld devices that can receive rich Internet content. These include Personal
Digital Assistants (PDAs), Pocket PCs, and even cellular phones. Some of these devices today are
achieving resolutions of 320x240 and even 640x480.
2.2.2 Vector Images
Vector images, unlike their raster counterparts, are defined by objects rather than pixels.
Each object in a vector image is composed of a collection of points connected by lines and curves to
define its shape, while color can be specified once each for its area and boundary. Compare this to
a bitmap image where each and every pixel has a defined color. The way that vector images achieve
this level of simplicity in definition is through their use of software for rendering. The definitions
of vector objects translate into instructions for a software engine to draw the image on a particular
output device. This way, a broad range of output devices can be supported, and an image will
not lose quality (i.e., resolution is not a factor) in its display on any given device. It then follows
that magnification of a vector image will not produce the blocking found in raster images, since
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 15
the image is redrawn to show a particular level of detail. Also, when printing vector images the
maximum output resolution of the printer can be used.
Simple shapes can be defined with ease in vector images. For example, a circle can be
defined by the coordinates for its center along with the measurement of its radius. This is the
minimal amount of information needed to create a circle. Similarly, drawing a rectangle only
requires the coordinates of one of its corners along with its height and width. This saves on the
size of a vector image file, making them ideal for the delivery graphical content over the Internet.
Hence, the balance of quality and file size demanded by raster images is not required by vector
images.
Vector images master the weaknesses identified in raster formats, but they certainly have
some weaknesses of their own. The one that initially stands out is the requirement for a specialized
piece of software to render any vector image. More specifically, there is the reliance on the ability of
this software to render images efficiently. The more complex an image is, the more difficult and time
consuming it is to render, making vector images more dependent on the CPU than raster images.
The upside is that this should improve with time as software engines improve and processor speeds
increase. Also, most machines today have Graphical Processing Units (GPUs) specifically for visual
rendering. Some common vector image file types are Encapsulated PostScript (EPS), CorelDraw,
AutoCAD, Windows Metafile (WMF), Macromedia Flash, and Scalable Vector Graphics (SVG).
2.2.3 The Scalable Vector Graphics (SVG) Format
Scalable Vector Graphics (SVG) is the first open vector graphics standard for the Web,
intended for producing two-dimensional vector graphics where zooming and panning are possible
with no loss of quality to the rendered image. It became a World Wide Web Consortium (W3C)
Recommendation in the latter half of 2001, with the latest revision coming in 2003 [43] along with
a mobile version [44]. It is an Extensible Markup Language (XML) based graphics standard, so
it integrates well with other XML Document Type Definitions and can make use of Extensible
Stylesheet Language Transformations (XSLT) for visually representing many types of data. While
there is native browser support for the language in only a few packages (Mozilla, Konqueror, etc.),
both Adobe and Corel have SVG viewers that both plug into the Internet Explorer and Netscape
browsers.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 16
Scalable Vector Graphics files can contain syntax to display vector graphics shapes, text,
and even raster images. Objects in an SVG image can be grouped, styled, and transformed.
Gradients, patterns, masks, filter effects, and simple alpha blending are available to provide rich
graphical content. In addition, animation elements are defined in the recommendation and SVG
images can be designed to be interactive with scripting. Since it is a vector graphics format, SVG
is resolution independent allowing images to be rendered relative to the size of a browser window.
This all comes together to provide a powerful toolset for the creation and display of two-dimensional
graphics.
The format is popular with cartographers as a simple, open solution to delivering interactive
maps over the Web [38], as opposed to a proprietary format with limited interface capabilities and
limited extensibility. For developers there are many tools available that can allow for easy viewing
and editing of SVG due to its roots in XML. Also, any programming language that can manipulate
text files—or better yet, parse XML—can generate and manipulate SVG. In the open source realm
there is also the Apache Batik project [4]. It is a Java-based toolkit providing a set of code modules
for the generation, manipulation, and even viewing of SVG images. Using these modules allows
developers to more easily integrate SVG into their client as well as server-side Java applications.
Nuts and Bolts of SVG
This section is an introduction to the basic ideas of SVG to provide a context for the
information embedding techniques described in the next chapter. Therefore, this is not by any
means as complete as a text based solely on SVG. For a more in-depth treatment of the material, see
books such as [49] and [10], and of course the World Wide Web Consortium (W3C) recommendation
[43]. A simple example is provided in Figure 2.1 with its related source in Listing 2.1 to explain
some of the features of the SVG markup.
In our example, the svg element is the document element, with all its children (i.e., elements
contained within it) describing the image to be rendered. The width and height attributes define
the extent of the SVG viewport, which is the area where content is rendered. In our example, the SVG content covers the entire page. The viewBox attribute scales the coordinate space within
the viewport, with the first two digits defining the (x, y) coordinate of the top-left corner, and
the second pair defining the bottom-right corner. Therefore, in viewing this SVG document on
a monitor with resolution greater than 640x480, each unit in the coordinate system covers more
than one pixel, making the image appear larger. Similarly, at lesser resolutions the image will
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 17
svg_8xair|ile.svq
i p c t .______r r r g »2 ^
Figure 2.1. Hello World SVG Example Image
appear smaller. It is possible for another svg element to be nested within the document element,
with its own coordinate space defined, to allow for the simple movement and independent sizing
of portions of an image. If conflicts of scale occur, there is a preserveAspectRatio attribute to
provide direction to the SVG rendering engine as to how to preserve aspect ratios.
Aside from the units of the coordinate system, other units of measure can be specified to
size and position graphics and text. In the example, the width and height of the viewport are
specified as a percentage of the width and height of the browser window. Items can also be sized
relative to the default font size. The absolute measurements available for use are pixels, points (72
per inch), picas (6 per inch), centimeters, millimeters, or inches.
The other major piece of overall presentation of SVG content is defined through styles.
Individual elements of style, such as stroke and fill color, are attributes on SVG graphical elements. Cascading Style Sheet (CSS) declarations are used in the style attribute, as presented in the
example. Our example also includes a style element, where internally an entire CSS style sheet can
be specified, though here it is used only to define the default font. As with other XML documents,
SVG can also be styled by referencing an externally defined CSS style sheet. Most of the SVG
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 18
documents we test for embedding hidden information (see appendix A) define styles in yet another
way: by adding entities to the document type definition.
Listing 2.1. Hello World SVG Example Source