Streaming-Based XML Encryption and Decryption

Streaming-Based XML Encryption and Decryption Chair for Network- and Data Security Horst Görtz Institute for IT Security Ruhr-University Bochum Benjamin Sanno Matriculation Number: 108006248774 Supervisors: Prof. Dr. Jörg Schwenk, Juraj Somorovský October 14, 2010 Declaration of Authorship I hereby declare that • that I have written this thesis without any help from others and without the use of documents and aids other than those stated below, • that I have mentioned all used sources and that I have cited them correctly according to established academic citation rules • that I have produced this thesis without the prohibited assistance of third parties and without making use of aids other than those specified • that this thesis has not previously been presented in identical or similar form to any other German or foreign examination board. Bochum, October 14, 2010 Benjamin Sanno 2 Abstract XML Encryption is a W3C recommendation that specifies how XML elements should be encrypted. Therewith, message confidentiality can be achieved. However, conventional frameworks applying XML Encryption use DOM-based XML processing. The DOM API is tree-based and therefore the whole document must be parsed before data can be encrypted or decrypted. In contrast, SAX and StAX do streaming-based XML processing and their output is a stream of events. So far there are no efficient and fast frameworks that apply XML Encryption to a stream of XML events. In this thesis, an event pipeline concept is used to further process the output of streaming-based XML APIs. Efficient and fast event pipeline modules are proposed that facilitate encryption and decryption. Each module was implemented and the decryption modules were analyzed to figure out which is most efficient. Measurements reveal that an event pipeline which uses streaming XML parsers has advantages over the DOM API with regard to memory requirements and execution time for the parsing and decryption process. 3 Contents 1 Introduction 1 2 Related Work 3 2.1 XML.............................................3 2.2 XML APIs.........................................4 2.2.1 SAX - Simple API for XML............................5 2.2.2 DOM API......................................6 2.2.3 StAX - Streaming API for XML.........................8 2.3 Event Pipeline.......................................9 3 Design Concepts 11 3.1 Nested XML Encryption and Decryption......................... 11 3.2 Stream Encryption..................................... 12 3.3 Stream Decryption..................................... 13 3.3.1 Push-Pull Problem on Event Streams...................... 14 3.3.2 Event-Stream Decryption............................. 15 3.3.3 Byte-Stream Decryption.............................. 17 4 Implementation 19 4.1 Event Pipeline....................................... 20 4.2 Encryption Module..................................... 22 4.3 Decryption Modules.................................... 23 4.4 Modification of the Javolution Parser Source Code................... 26 5 Performance Analysis 27 5.1 Experimental Setup.................................... 27 5.1.1 Measuring Execution Times............................ 27 5.1.2 Measuring Memory Usage............................. 28 5.2 Excursion: Parser Analysis................................ 29 5.3 Excursion: Base64 Decoding Performance........................ 32 5.4 Parsing Analysis...................................... 33 5.4.1 Memory Usage................................... 33 5.4.2 Execution Time................................... 35 5.5 Decryption Analysis.................................... 35 5.5.1 Memory Usage................................... 36 5.5.2 Execution Time................................... 39 6 Conclusion 40 4 List of Figures 2.1 Simple API for XML [17].................................5 2.2 Transformation of a document to its DOM representation [7].............6 2.3 Sun’s Project X reference implementation of a DOM API [17][28]...........7 2.4 Streaming API for XML..................................8 2.5 Basic event pipeline design................................9 3.1 Event pipeline configuration for nested encryption or decryption........... 11 3.2 Internal design of the encryption module......................... 12 3.3 Core design problem: interface between character events and the parser....... 14 3.4 Push-pull problem on streams............................... 15 3.5 Decryption module that implements simple buffering.................. 16 3.6 Decryption module that implements thread-based decryption............. 17 3.7 Illustration of the CharactersInputStream component................ 18 4.1 Prototype suite package overview............................. 19 4.2 Inner body of the pipeline package............................ 20 4.3 Strongly simplified call graph of the pipeline implementation............. 21 4.4 Call graph of the encryption module........................... 22 4.5 Internal dependencies of the decryption package.................... 23 4.6 Internal dependencies of the ESBufferDecrypterModule ................ 24 4.7 Internal dependencies of the ESThreadDecrypterModule ................ 25 5.1 Parser: heap size over time during parsing of 200.000 XML elements that have either the same tag name or all different tag names...................... 30 5.2 Xerces Parser: MAT analysis results........................... 31 5.3 Parser: heap size over time during parsing a completely encrypted XML document. 32 5.4 Base64 performance analysis............................... 32 5.5 Parser modules: heap size over time during parsing of 200.000 XML elements (150 different names)....................................... 34 5.6 Parser modules: performance analysis.......................... 35 5.7 Decryption modules: heap size over time during parsing and decryption of 200.000 XML elements....................................... 36 5.8 Decryption modules: heap size over elements during parsing and decryption of 100.000 XML elements................................... 37 5.9 Decryption modules: heap size over decryption progress in percent (200k elements, 8MB file).......................................... 38 5.10 Decryption modules: performance analysis........................ 39 5 1 Introduction The Extensible Markup Language (XML) [1] is used in many modern applications e.g. web services, cloud computing, database management, Service Oriented Architectures (SOA), et cetera. XML Encryption XML files can contain sensitive and confidential data. Therefore, security and especially privacy is important. To achieve these goals, computationally intensive XML file encryption is necessary based on the XML Encryption recommendation [5]. XML APIs A common Application Programming Interface (API) to access XML files is the DOM API [6]. The Document Object Model (DOM) is an in-memory representation of the XML file. This API is usually used to read an XML file. However, a disadvantage is that much system memory is occupied if large files are parsed. Furthermore, multiple DOM objects must be created and stored in the memory if XML Encryption is used. Another concept are streaming-based XML APIs like SAX (Simple API for XML) [14] or StAX (Streaming API for XML) [19]. These do not create an in-memory representation of the XML file. Their output is a stream of XML events, not a DOM. Streaming-based concepts have great advantages over tree-based concepts. XML element data becomes available to the subsequent application as soon as the parser has read it. This results in low latency, efficient use of CPU cycles and more reactive client applications. Another advantage of the streaming concept is that XML source files can be larger than the system memory. For instance, XML database files of insurance companies, in cloud computing environments, and others can have sizes up to 10GB or more [29, 30]. Some modern XML applications are executed on mobile devices with limited computing and memory resources so that a DOM object would occupy a large proportion of that limited resources [31]. Streaming-Based Processing "It can be argued that the majority of XML business logic can benefit from stream processing, and does not require the in-memory maintenance of entire DOM trees." [18]. An event pipeline pattern can be used to apply complex functionality to streaming- based XML APIs. To address confidentiality of information, an XML event pipeline should realize the XML Encryption recommendation to enable cryptographic functionality for those XML APIs. Until now, there is less scientific work in this field of IT-security [2,3,4]. Prototype Implementation The main goal of this thesis is to design efficient and fast event- stream encryption and mainly decryption concepts for event pipeline modules. Those concepts are implemented to demonstrate their functionality. The final decryption component should be able to process nested encrypted data as efficient as possible. Although the source XML file contains nested and encrypted XML elements, the final implementation should be able to process the elements strictly in sequential order. Otherwise, the advantages of the event pipeline pattern may be lost. 1 1 Introduction Optimization Usually, network bandwidth as well as execution time, latency and efficient memory usage are crucial factors in computing environments. As a consequence of this, it is very important to address performance and processing efficiency. Therefore, the performance of the prototype implementation was extensively analyzed for this thesis. The next chapter explains the related work. Basic concepts like XML,

Streaming-Based XML Encryption and Decryption

JAXB Release Documentation JAXB Release Documentation Table of Contents

Fun Factor: Coding with Xquery a Conversation with Jason Hunter by Ivan Pedruzzi, Senior Product Architect for Stylus Studio

XML for Java Developers G22.3033-002 Course Roadmap

XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département D©Informatique Et De Recherche Opérationnelle Université De Montréal

Testing XSLT

JSR-000206 Java™API for XML Processing (“Specification”)

An Empirical Analysis of XML Parsing Using Various Operating Systems

Xquery 3.0 69 Higher-Order Functions 75 Full-Text 82 Full-Text/Japanese 85 Xquery Update 87 Java Bindings 91 Packaging 92 Xquery Errors 95 Serialization 105

Java API for XML Processing (JAXP)

1 Agenda Summary of Previous Session

Xquery API for Java™ (XQJ) 1.0 Specification Version 0.9 (JSR 225 Public Draft Specification) May 21, 2007

JAXP.Ps (Mpage)