Converting Network Media Data Into Human Readable Form a Study on Deep Packet Inspection with with Real- Time Visualization
Total Page:16
File Type:pdf, Size:1020Kb
Degree project Converting Network Media Data into Human Readable Form A study on deep packet inspection with with real- time visualization. Author: Steffen-Marc Förderer Date: 2011-05-25 Subject: Computer Science Level: Bachelor Course code: 2DV00E Converting Network Media Data into Human Readable Form A Study on deep packet inspection with real-time visualization. by c Steffen-Marc F¨orderer A thesis submitted to the School of Graduate Studies in partial fulfilment of the requirements for the degree of Bachelor of Computer Science Matematiska och systemtekniska institutionen Linn´euniversitetet 05 2011 V¨axj¨o Sverige Abstract A proof of concept study into the working of network media capture and visual- ization through the use of Packet Capture in realtime. An application was developed that is able to capture tcp network packets; identify and display images in raw HTTP network traffic through the use of search, sort, error detection, timeout failsafe al- gorithms in real time. The application was designed for network administrators to visualize raw network media content together with its relevant network source & ad- dress identifiers. Different approaches were tried and tested such as using Perl with GTK+ and Visual Studio C# .Net. Furthermore two different types of image identi- fication methods were used: raw magic string identification in pure tcp network traffic and HTTP Mime type identification. The latter being more accurate and faster. C# was seen as vastly superior in both speed of prototyping and final performance eval- uation. The study presents a novel new way of monitoring networks on the basis of their media content through deep packet inspection. — LNU Keywords TCP/IP Monitoring, Network Sniffing, Network Monitoring, Image Reconstruc- tion, Packet Data Reassembly, Real-time Visualization ii Acronyms and Abbreviations ACK - Acknowledgement Packet API - Application Programming Interface ARP - Address Resolution Protocol ASCII - American Standard Code for Information Interchange BMP - BitMap image file / Image Type CPU - Central Processing Unit DNS - Domain Name Service / System GIF - Graphics Interchange Format / Image Type GTK - Gimp Tool Kit GZIP - GNU Zip / file compression HEX - Hexadecimal / Base 16 HTTP - Hyper Text Transfer Protocol ICMP - Internet Control Message Protocol IP - Internet Protocol JPEG - Joint Photographic Experts Group / Image Type LAN - Local Area Network MAC - Media Access Control MIME - Multipurpose Internet Mail Extensions / Content Types OSI Model - Open Systems Interconnection Model PCAP - Packet Capture API PNG - Portable Network Graphics / Image Type POE - Perl Object Environment RAM - Random Access Memory SSH - Secure Shell SYN - Synchronize Packet TCP - Transport Control Protocol TTL - Time To Live URL - Uniform Resource Locator XML - Extensible Markup Language iii Acknowledgements Many thanks to Welf L¨owe and Mathias Hedenborg for their patience and help. Most of all I want to thank my family who has supported me throughout my degree. “Too much technology, in too little time. And little by little ... we went insane.” — Anonymous/ iv Contents 1 Introduction 1 1.1 TheProblem ............................... 1 1.2 Background ................................ 2 1.3 Researchobjective ............................ 2 1.4 PreliminaryRequirements ........................ 3 1.5 ThesisOverview.............................. 4 2 Background and Related Work 5 2.1 TheoreticalFramework.......................... 5 2.2 Background ................................ 5 2.2.1 AnatomyofaHTTPRequest .. .. 6 2.3 RelatedWork ............................... 8 3 Design Requirements 11 3.1 PerformanceRequirements . 11 3.2 FunctionalRequirements . 11 3.3 UsabilityRequirements.......................... 12 3.4 TextualUseCasescenarios. 12 3.4.1 Scenario1.............................. 12 3.4.2 Scenario2.............................. 13 3.4.3 Scenario3.............................. 13 4 Implementation 14 4.1 LiveCapture ............................... 14 4.2 Non-Live Packet Capture and Processing . 15 4.3 MechanicsofPcap ............................ 15 4.4 SortingPacketsintoConnections . 16 4.5 Imageverification............................. 17 4.6 Preventing Connection Buffer Overflow . 17 4.7 Findingtheimage............................. 18 4.7.1 UsingMagicNumbers ...................... 18 4.7.2 Using HTTP Headers (MIME Types) . 19 5 Getting the Network data 21 5.1 Switched/Non Switched Networks . 21 5.2 TheReal-timeproblem.......................... 23 v 6 The Graphical User Interface 26 6.1 Perl,EclipseandGTK+ ......................... 26 6.2 C#,VisualStudioand.Net ....................... 28 6.2.1 Implemented Functionality . 31 6.3 FinalApplicationLayout......................... 33 7 Evaluation 34 7.1 EvaluationandTesting.......................... 34 7.1.1 Usability Evaluation . 34 7.1.2 Functional Evaluation . 37 7.1.3 PerformanceEvaluation . 39 8 Conclusions and Future Work 43 8.1 Conclusion................................. 43 8.2 FutureWork................................ 43 Bibliography 45 vi List of Figures 2.1 HTTPGetRequestHeader ....................... 7 2.2 HTTPResponsewithHeaderandStartofImage . 7 2.3 York1.55forWindows .......................... 9 2.4 Driftnet 0.1.6 Linux/Solaris . 10 2.5 EtherPEG1.3a1MacOSX........................ 10 4.1 LivePacketCaptureProcedure . 14 4.2 InitializeandStartPcap ......................... 16 4.3 ConnectionArrays ............................ 17 4.4 JPEG Beginning of file identifier . 19 5.1 SimpleEthernetHub........................... 21 5.2 SimpleEthernetSwitch.......................... 22 5.3 EthernetSwitchwithMITMattack . 24 6.1 PerlPrototypeApplicationOverview . 27 6.2 C# Application Overview . 28 6.3 ImageScrollBuffers ........................... 29 6.4 OriginalImageWindow ......................... 29 6.5 C# Application with Non Scrolling Image Display . 30 6.6 C# Variable Image and Processing Speed Information . 31 6.7 UMLUseCaseDiagram ......................... 33 7.1 Application used to simulate browsing behaviour. 38 7.2 ImageLoss(LivevsNon-live) . 39 7.3 ImageLoss(ByImageType) . .. .. 40 7.4 RawTestData .............................. 41 7.5 RawTestDataCont............................ 42 vii 1 Introduction This chapter discusses the basic problems and issues that network administrators are faced with in terms of network content detection and classification. It introduces the objectives that this thesis aims to fulfil along with a set of requirements. 1.1 The Problem Network administrators today are faced with the complex challenge of monitoring, maintaining and developing better ways to analyse their networks’ traffic. Often the tools used for this task consist of packet capturing programs that enable the admin- istrator to diagnose numerous problems. These modern tools all support a variety of options to inspect and analyse traffic down to the basics of packet type and structure. The way that these packet capturing programs display their information to the ad- ministrator has been the same since the beginnings of the ARPA Network consisting of simple text (ASCII - American Standard Code for Information Interchange) or HEX - Hexadecimal form. However, although powerful these programs lack a simple feature when it comes to network content inspection. The fact that there are times when a network administrator is not concerned about the type of data packet but rather its actual data payload, has been largely ignored by the development community. This is where all network monitoring programs fail. They are unable to analyse the packet payload to the extent that an administrator does not have to look at a ASCII conversion of the binary content in the packet. Take this scenario for example, a network administrator has been notified that illicit content is being transferred over the network and to find out where the data is coming from and where it is being sent. This is very often the case for Internet Service Providers. The theoretical ideal way to go about this would be to employ firewalls with deep packet inspection that can recognize illegal/copyrighted data and prevent it from being forwarded through the network. Although great advances have been made in image recognition, no hardware or software has been created so far that can recognize whether an image or video fits a certain content criteria. One still relies on the human to analyse and make choices in this scenario. The problem however is not solved and no packet capturing program can convert a packets payload into human readable / recognizable form when it comes to web media. The problem therefore is that no packet capturing tools exist that are able to capture packet payloads, convert them into their original form and displaying this data to the user. This thesis aims to solve this issue by reassembling an network data into its original form which in our case is image data and present this to the user in an interactive manner. 1 1.2 Background The author of ”Ethereal Packet Sniffing”, Angela Orebaugh explains the importance of network analysis: ”... is the key to maintaining an optimized network and detecting security issues. Proactive management can help find issues before they turn into serious problems and cause network downtime or compromise confidential data. In addition to identifying attacks and suspicious activity, you can use your network analyser data to identify security vulnerabilities and weaknesses and enforce your companys security policy. Sniffer logs can be correlated with IDS, firewall, and router logs to provide evidence