Identifying Application Protocols in Computer Networks Using Vertex Profiles

IDENTIFYING APPLICATION PROTOCOLS IN COMPUTER NETWORKS USING VERTEX PROFILES By Edward G. Allan, Jr. A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in the Department of Computer Science December 2008 Winston-Salem, North Carolina Approved By: Errin W. Fulp, Ph.D., Advisor Examining Committee: David J. John, Ph.D., Chairperson William H. Turkett, Jr., Ph.D. Acknowledgements This thesis is the product of many people’s labors, not just my own. The ideas contained in the pages that follow have been formulated and refined for over a year, with the guidance and support of several people, whose assistance I would be remiss not to mention. I would like to thank Wake Forest University and GreatWall Systems, Inc. for their support. This research was funded by GreatWall Systems, Inc. via the United States Department of Energy STTR grant DE-FG02-06ER86274. 1 I would also like to thank my parents for their support throughout my years at Wake Forest, both as an undergraduate and as a graduate student. Without their encouragement and financial assistance, none of this would have been possible. I also would not be where I am today without the help of my friends, who have made these past several years some of the most enjoyable and most memorable yet. My thesis committee members, Dr. David John and Dr. William Turkett, Jr., were instrumental in providing me with feedback throughout the research and writing process. Their comments and criticism have undoubtedly enabled the success of this endeavor. I would especially like to thank Dr. Turkett for selflessly spending hours assisting me and stepping in as my “adopted advisor” during Dr. Errin Fulp’s sabbatical. Last, but certainly not least, I must thank my advisor, Dr. Errin Fulp. I have been fortunate to work with him in a variety of contexts for more than five years now, and he has been a tremendous influence on both my personal and academic development. His relaxed personality and great sense of humor kept me off-task just enough to save my sanity, while his insight and guidance allowed me to complete my studies and be ready to move on to the next chapter in my life. Many thanks again to all who have helped me along the way — you are much appreciated. 1The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the DOE or the U.S. Government. ii Table of Contents Acknowledgements................................... ............... ii Illustrations...................................... ................... vi Abbreviations...................................... ................. viii Abstract........................................... ................. x Chapter 1 Introduction............................... ............. 1 1.1 Issues in Network Management and Security . .. 2 1.2 Current Methods of Network Analysis . 2 1.2.1 ApplicationsandPortNumbers . 3 1.2.2 PacketInspection......................... 4 1.3 Interdisciplinary Study of Network Communications . ....... 4 1.3.1 SocialNetworks.......................... 5 1.3.2 BiologicalNetworksandMotifs . 6 1.4 Outline................................... 7 Chapter 2 Computer Networks and Communications. ..... 8 2.1 Network Topologies and Architectures . .. 8 2.2 Computer Network Reference Models . 10 2.2.1 TheOSIModel .......................... 10 2.2.2 TheTCP/IPModel ....................... 12 2.3 Layer3:TheNetworkLayer . 13 2.4 Layer4:TheTransportLayer . 13 2.5 Layer7:TheApplicationLayer . 14 Chapter 3 Graph Analysis.............................. ........... 16 3.1 GraphTerminologyandBasicProperties . .. 16 3.2 TypesofGraphs ............................. 17 3.3 TraditionalGraphMeasures . 18 3.3.1 DistancesandPathLengths . 18 3.3.2 CentralityMeasures . 19 3.3.3 ClusteringCoefficient. 21 iii iv 3.3.4 Application of Traditional Graph Measures in Computer Net- works ............................... 22 3.4 NetworkMotifs .............................. 22 3.4.1 DefinitionofaMotif ....................... 23 3.4.2 FunctionofMotifs ........................ 24 3.5 AnalysisofApplicationGraphs . 25 Chapter 4 Data Selection and Considerations . ......... 26 4.1 NetworkTraceFiles............................ 26 4.2 Challenges Associated with Network Data Collection . ...... 26 4.2.1 DataCapture ........................... 27 4.2.2 PrivacyandSanitizationofData . 28 4.2.3 NetworkandDataView . 29 4.3 DataSources ............................... 30 4.3.1 Dartmouth College Wireless Traces . 31 4.3.2 LBNL/ICSI Enterprise Tracing Program . 31 4.3.3 OSDIConferenceNetworkTraces . 31 4.4 ProtocolSelection............................. 32 Chapter 5 Experimental Methodology . ......... 36 5.1 HardwareandLinuxSystem . 36 5.2 PacketCaptureandStorage . 37 5.3 CreationofApplicationGraphs . 37 5.4 TraditionalGraphMeasures . 39 5.5 MotifAnalysis............................... 40 5.6 VertexProfiles............................... 43 5.7 K-Nearest Neighbor Classification . .. 44 5.7.1 MeasuringProfileSeparation . 45 5.7.2 Cross Validation of Classification Results . ... 46 5.8 Genetic Algorithm Feature Weighting . .. 46 5.8.1 OverviewofGeneticAlgorithms . 47 5.8.2 FeatureWeighting ........................ 48 Chapter 6 Results and Analysis......................... ........... 49 6.1 PreliminaryInvestigations . 49 6.2 InitialResults............................... 50 6.2.1 TraditionalGraphMeasureProfiles . 51 6.2.2 Motif-basedProfiles. 54 6.3 Weighted Profiles and Key Attributes . 57 v 6.3.1 Attribute Weights of Traditional Graph Measures . ... 58 6.3.2 Attribute Weights of Motif-based Measures . .. 59 6.4 ComparisonofProfileTypes . 61 6.5 Considerations for Optimizing Classifier Performance . ........ 63 6.6 LimitationsofCurrentApproach . 66 Chapter 7 Conclusions and Future Work . ........ 67 References ......................................... ................. 71 Appendix A Examples of Application Graphs . ....... 76 Appendix B Code Listings.............................. ........... 78 Appendix C Test Parameters............................ ........... 85 Appendix D Additional Classification Results . .......... 87 Vita............................................... .................. 88 Illustrations List of Tables 4.1 Summary statistics of three trace files examined . ..... 31 5.1 Graphorders for each application protocol . .... 38 6.1 Classification accuracy of 65 application graphs . ...... 50 6.2 An example confusion matrix with three classes . .... 50 6.3 Confusion matrix of unweighted traditional graph measures...... 52 6.4 Number of single and multi-class ties for traditional graph measures 53 6.5 Confusion matrix of unweighted motif-based profiles . ...... 55 6.6 Number of single and multi-class ties for motif-based profiles .... 55 6.7 Percentage of original data used in motif-based profiles ........ 57 6.8 Attribute weights for traditional graph measures . ...... 58 C.1 FANMODtestparameters . 85 D.1 Confusion matrix of 65 application graphs using motif frequencies . 87 D.2 Confusion matrix of weighted traditional graph measures....... 87 D.3 Confusion matrix of weighted motif profiles . ... 87 List of Figures 1.1 ExampleoutputfromNetStat . 3 1.2 Graphical depiction of a social network with two distinctly visible clus- ters .................................... 6 2.1 Four network topologies: bus, ring, star and mesh [1] . ...... 9 2.2 The OSI and TCP/IP reference models [2] . 11 2.3 AnIPdatagramheader[2] ....................... 13 2.4 UDPandTCPdatagramheaders[2] . 14 2.5 Example communication between a client and a web server . .... 15 3.1 Agraphwithfivenodesandfiveedges . 17 3.2 Schematic view of motif detection [3] . .. 23 3.3 All 13 configurations of order 3 connected subgraphs [3] . ...... 24 vi vii 3.4 Afeed-forwardloop ........................... 24 4.1 Tcpdump output containing timestamp, protocol, source IP, source port, destination IP, destination port, packet length and packet flags 27 5.1 Overview of the proposed methodology and tools used . .... 36 5.2 Storing packets from a pcap file into a MySQL database . ... 37 5.3 Amotifwithcoloredvertices . 41 5.4 FANMOD edge-switching process for generating random networks [4] 42 5.5 Arrays representing vertex profiles . ... 43 5.6 Single-point crossover of two binary strings . ..... 48 6.1 Profile collisions for traditional graph measures . ....... 54 6.2 Profile collisions for motif-based profiles . ..... 56 6.3 Depiction of three application graphs: HTTP, AIM and SSH ..... 57 6.4 Accuracy of unweighted vs. weighted traditional graph measure profiles 59 6.5 The ten highest-weighted motifs and their corresponding weights . 60 6.6 Accuracy of unweighted vs. weighted motif-based profiles ...... 61 6.7 Accuracy comparison of unweighted profile types . .... 62 6.8 Accuracy of single attribute classification . ..... 64 6.9 Comparison of profile types as the size of the training set increases . 65 A.1 Application graphs depicting AIM communications . ..... 76 A.2 Application graphs depicting DNS communications . ..... 76 A.3 Application graphs depicting HTTP communications . ..... 76 A.4 Application graphs depicting Kazaa communications . ...... 77 A.5 Application graphs depicting MSDS communications . ..... 77 A.6 Application graphs depicting Netbios communications . ....... 77 A.7 Application graphs depicting SSH communications . ..... 77 Abbreviations Acronyms AIM - AOL Instant MessengerTM API - Application Programming Interface AUP

Identifying Application Protocols in Computer Networks Using Vertex Profiles

Remote Collaborative Real-Time Multimedia Experience Over The

The Internet in Transition: the State of the Transition to Ipv6 in Today's

Army Packet Radio Network Protocol Study

Data Communications & Networks Session 1

Session 5: Data Link Control

Multihop Packet Radio Networks

Anomaly Detection in Iot Communication Network Based on Spectral Analysis and Hurst Exponent

An Overview & Analysis Comparision of Internet Protocal Tcp\Ip V/S Osi

Determining the Network Throughput and Flow Rate Using Gsr and Aal2r

Introducing Network Analysis

Pluribus Network Packet Broker

Towards an In-Depth Understanding of Deep Packet Inspection Using a Suite of Industrial Control Systems Protocol Packets Guillermo A