A Comparative Analysis of ASCII and XML Logging Systems Eric C

Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 9-10-2010 A Comparative Analysis of ASCII and XML Logging Systems Eric C. Hanington Follow this and additional works at: https://scholar.afit.edu/etd Part of the Information Security Commons, and the Programming Languages and Compilers Commons Recommended Citation Hanington, Eric C., "A Comparative Analysis of ASCII and XML Logging Systems" (2010). Theses and Dissertations. 1993. https://scholar.afit.edu/etd/1993 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. A Comparative Analysis of ASCII and XML Logging Systems THESIS Eric Hanington, Civilian, USAF AFIT/GCO/ENG/10-17 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views expressed in this thesis are those of the author and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the United States Government. AFIT/GCO/ENG/10-17 A Comparative Analysis of ASCII and XML Logging Systems THESIS Presented to the Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command In Partial Fulfillment of the Requirements for the Degree of Master of Science Eric Hanington, Bachelor of Science in Computer Science Civilian, USAF Sept 2010 APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. AFIT/GCO/ENG/10-17 A Comparative Analysis of ASCII and XML Logging Systems Eric Hanington, Bachelor of Science in Computer Science Civilian, USAF Approved: /signed/ 10 Sept 2010 Dr. Rusty Baldwin, (Chairman) date /signed/ 10 Sept 2010 Dr. Michael Grimaila (Member) date /signed/ 10 Sept 2010 Dr. Robert Mills (Member) date AFIT/GCO/ENG/10-17 Abstract This research compares XML and ASCII based event logging systems in terms of their storage and processing efficiency. XML has been an emerging technology, even for security. Therefore, it is researched as a logging system with the mitigation of its verbosity. Each system consists of source content, the network transmission, database storage, and querying, which are all studied as individual parts. The ASCII logging system consists of the text file as source, FTP as transport, and a relational database system for storage and querying. The XML system has the XML files and XML files in binary form using Efficient XML Interchange encoding, FTP as transport using both XML and binary XML, and an XML database for storage and querying. Further comparisons are made between the XML itself and binary XML, as well as binary XML to ASCII text when comparing file sizes such as, ASCII = 2211.58 KB, XML = 3463.56 KB, and binary XML = 425 KB. As well as, transmission efficiency having times such as, ASCII = 12 msec, XML = 26 msec, and binary XML = 4 msec. XML itself is a poor choice compared to ASCII for hard drive being 1.6 times greater and network transport time being 2.2 times longer. However, in a binary form compared to ASCII, it uses less hard drive space, that is 0.19 times the size, and network resources being a 1/3. Because no XML databases support a binary XML, it is loaded without any optimization. The ASCII loads into the relational database with less time than XML into its database. ASCII's loading time could load in 1.9 seconds compared to the XML loading in 7.7 sec. However, querying each database, neither outperforms the other as one query results in shorter time for one, and another query results in a shorter time for the other. One query resulted in the relational database executing in 3 sec and the XML database in 85 sec. Where as, another query has relational database taking 37 sec and the XML database 6 sec. Therefore, XML and/or its binary form is a viable candidate for use as a comprehensive logging system. iv Acknowledgements First and foremost, I owe a large debt of gratitude to my family for their encour- agement and support. I also am in much debt to my advisor for his countless times in guiding me in the writing of my thesis. As well as my other committee members in their resources and advise given. Finally, for staff at the Air Force Institute of Technology for the resources that they let me use for my thesis. Eric Hanington v Table of Contents Page Abstract..................................... iv Acknowledgements . .v List of Figures . viii ListofTables..................................x I. An XML logging system? . .1 II. Background Material . .3 2.1 XML.............................3 2.2 Logs.............................6 2.3 XML Logs . .8 2.4 Querying Logs . 13 2.5 Summary .......................... 14 III. Methodology ............................. 15 3.1 Introduction . 15 3.2 Problem Definition . 15 3.2.1 Goals and Hypothesis . 15 3.2.2 Approach . 15 3.3 System Under Test . 16 3.4 System Services . 16 3.5 Workload .......................... 17 3.5.1 ASCII . 17 3.5.2 XML . 18 3.5.3 Binary XML/EXI . 19 3.6 Performance Metrics . 19 3.7 System Parameters . 19 3.8 Factors ........................... 21 3.9 Evaluation Technique . 22 3.10 Experimental Design . 23 3.11 Methodology Summary . 23 vi Page IV. Analysis of ASCII, XML, & EXI . 25 4.1 Introduction . 25 4.2 Computer . 26 4.2.1 Comparing file sizes . 26 4.3 Codec Methods . 29 4.3.1 Codec Time & Memory . 29 4.3.2 Time........................ 31 4.4 Network Transfer . 35 4.5 Database . 40 4.5.1 Loading database . 40 4.5.2 Querying database . 40 V. Conclusions . 53 5.1 XML Files . 53 5.2 Networking . 53 5.3 Database . 54 5.3.1 Loading data . 54 5.3.2 Querying data . 54 5.4 Further Research . 55 5.5 Conclusion . 55 Appendix A. Databases . 56 A.1 Relational Database . 56 A.1.1 Database Schema . 56 Bibliography .................................. 72 AuthorIndex ..................................1 vii List of Figures Figure Page 2.1. XML . .4 2.2. EXI vs gzip . .4 2.3. EXI vs ASN.1 . .5 2.4. EXI Encoding Speed . .6 2.5. EXI Decoding Speed . .7 2.6. XMLSchema............................8 2.7. BinaryLog.............................9 2.8. Binary Log Viewer . 10 2.9. ASCII log . 10 2.10. SEAL................................ 11 2.11. SEAL's XML log . 12 2.12. Common Event Expression chart . 13 2.13. Simple XQuery statement . 14 3.1. System Under Test . 16 4.1. Daily: ASCII scatter plot time to file size correlation . 39 4.2. Daily: XML scatter plot time to file size correlation . 39 4.3. Querying for System Arguments: time Daily Round 1 . 43 4.4. Querying for System Arguments: time Daily Round 2 . 44 4.5. Querying for System Arguments: time Halfday Round 1 . 44 4.6. Querying for System Arguments: time Halfday Round 2 . 45 4.7. Querying for System Arguments: time Hourly Round 1 . 45 4.8. Querying for System Arguments: time Hourly Round 2 . 46 4.9. Querying for Only System's third path: time Daily Round 1 . 49 4.10. Querying for Only System's third path: time Daily Round 2 . 50 4.11. Querying for Only System's third path: time Halfday Round 1 50 viii Figure Page 4.12. Querying for Only System's third path: time Halfday Round 2 51 4.13. Querying for Only System's third path: time Hourly Round 1 . 51 4.14. Querying for Only System's third path: time Hourly Round 2 . 52 ix List of Tables Table Page 3.1. Factors Table . 21 4.1. File Sizes Averages Summary (Kilobytes) . 26 4.2. File size (avg) Ratios: ASCII as base . 28 4.3. File sizes EXI Schema significant difference T-test . 28 4.4. File sizes EXI Coding-mode significant difference T-test . 29 4.5. Encoding times Average Summary (Seconds) . 30 4.6. Encoding memory Average Summary (Kilobytes) . 30 4.7. Decoding memory Average Summary (Kilobytes) . 30 4.8. Encoding time Schema significant difference T-test . 32 4.9. Encoding time Coding-mode significant difference T-test . 32 4.10. Encoding time to file size correlation (Pearson) . 33 4.11. Decoding times Average Summary (Seconds) . 33 4.12. Decoding time Schema significant difference T-test . 34 4.13. Decoding time Coding-mode significant difference T-test . 34 4.14. Decoding time to file size correlation (Pearson) . 35 4.15. Transmission times Averages Summary (seconds) . 36 4.16. Network Times (avg) Ratios: ASCII as base . 36 4.17. Transmission times Schema significant difference T-test . 37 4.18. Transmission time Coding-mode significant difference T-test . 37 4.19. Transmission time ASCII versus EXI significant difference T-test 38 4.20. Loading avg Time (sec) . 40 4.21. Loading XMLDB Daily Schema significant difference T-test . 41 4.22. Querying System's Arguments avg Time (sec) Daily . 42 4.23. Querying System's Arguments avg Time (sec) Halfday . 42 4.24. Querying System's Arguments avg Time (sec) Hourly . 42 x Table Page 4.25. Querying System's Arguments Memory (MB) Daily . 43 4.26. Querying System's Arguments Memory (MB) Halfday . 43 4.27. Querying System's Arguments Memory (MB) Hourly . 43 4.28. Querying System's Third path avg Time (sec) Daily . 48 4.29. Querying System's Third path avg Time (sec) Halfday . 48 4.30. Querying System's Third path Memory (MB) Daily . 49 4.31. Querying System's Third path Memory (MB) Halfday .

Load more