Classifying P2P Activities in Netflow Records: a Case Study (Bittorrnet & Skype) by Ahmed Bashir a Thesis Submitted to the F
Total Page:16
File Type:pdf, Size:1020Kb
Classifying P2P Activities in Netflow Records: A Case Study (BitTorrnet & Skype) by Ahmed Bashir A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Masters of Applied Science in Systems and Computer Engineering Carleton University Ottawa, Ontario © 2012, Ahmed Bashir To my Father. R.I.P II Abstract This thesis addresses the problem of identifying BitTorrent and Skype activities within Netflow traces. The ability to accurately classify different types of internet traffic using Netflow traces represents a major challenge in the field of internet traffic classification since there is no payload information available with Netflow. Nowadays, P2P applications represent a large portion of the internet traffic and are becoming more difficult to classify. In this thesis a simple yet effective classification method is proposed using a set of heuristics based on the discriminating features and the nature of P2P applications. The presented scheme has been tested with a collection of real data sets. The results of the classification have shown to be accurate when applied to data sets with challenging internet traffic. Furthermore, the results of the proposed scheme have proven to be superior to other existing approaches in terms of the accuracy of the classification. III Acknowledgement First and foremost, I offer my utmost gratitude to my supervisors Dr.Changcheng Huang and co- supervisor Dr. Biswajit Nandy as this thesis would not have been possible without their encouragement, guidance and support from the initial to the final level of this research. I would also sincerely thank everyone who supported me in any kind of way during the completion of this thesis. IV Table of Contents Abstract ................................................................................................................................................................ III Acknowledgement ................................................................................................................................................ IV Table of Contents................................................................................................................................................... V List of Figures ...................................................................................................................................................... VIII List of Tables ......................................................................................................................................................... IX Chapter 1 Introduction .......................................................................................................................................... 1 1.1 Background 1 1.2 Research Problem 3 1.3 Research Objective 4 1.4 Suggested Approach 5 1.5 Summary of Contributions 6 1.6 Thesis Organization 6 Chapter 2 Background Knowledge ......................................................................................................................... 8 2.1 Literature Survey of Traffic Classification Approaches 8 2.1.1 Deep Packet Inspection (DPI) ....................................................................................................................... 9 2.1.2 Port Based Classification ............................................................................................................................ 11 2.1.3 Machine Learning....................................................................................................................................... 12 2.1.4 Host Behavior ............................................................................................................................................. 14 2.2 Approach Selection and Comparison 15 2.3 Approaches for Identifying BitTorrent and Skype 17 2.3.1 BitTorrent Approaches Comparison .......................................................................................................... 17 2.3.2 Skype Approaches Comparison.................................................................................................................. 18 2. 4 P2P Applications 18 2. 4.1 BitTorrent .................................................................................................................................................. 19 2. 4.1.1 BitTorrent Peer Location .................................................................................................................... 20 2.4.2 Skype VOIP ................................................................................................................................................. 22 2.4.2.1 Start up and Login Authentication ...................................................................................................... 22 2.4.2.2 Initiating a Call and Connection........................................................................................................... 25 2.5 Netflow and General P2P features 27 2.5.1 Netflow…………………………………………………………………………………………………………………………………………………28 2.5.2 Features for Classifying P2P Activities ....................................................................................................... 31 2.5.2.1 Port Numbers ...................................................................................................................................... 31 2.5.2.2 TCP Push Flag....................................................................................................................................... 33 V 2.5.2.3 Incoming and Outgoing Flows ............................................................................................................. 34 2.5.2.4 Maximum Transmission Unit (MTU) ................................................................................................... 36 Chapter 3 Classification Heuristics and Proposed Scheme ................................................................................... 39 3.1 Detection of Potential BitTorrent or Skype Hosts 40 3.1.1 P2P Flow Identification .............................................................................................................................. 41 3.1.1.1 Distinguishing between BitTorrent and Skype Flows .......................................................................... 42 3.1.2 Identifying BitTorrent Flows Contacting Trackers ...................................................................................... 46 3.1.3 Identifying Skype Flows Contacting the Skype Server ............................................................................... 48 3.1.4 Identifying BitTorrent Heavy Hitting Flows ................................................................................................ 50 3.2 Host IP and Port Identification and Summary 52 3. 3 Traffic Detection and Extraction Heuristics 56 3.3.1 BitTorrent Traffic Heuristics and Extraction ............................................................................................... 56 3.3.1.1 Incoming Traffic via DHT Contacted Peers .......................................................................................... 57 3.3.1.2 Incoming Traffic from Peers via Trackers (UDP) .................................................................................. 59 3.3.1.3 Incoming Traffic from Peers via Trackers (TCP) ................................................................................... 60 3.3.2 Skype Traffic Heuristics and Extraction ...................................................................................................... 62 3.3.2.1 Extracting All Skype Flows ................................................................................................................... 63 3.3.2.2 Distinguishing Between Skype Call Flows and Skype Control Flows ................................................... 65 3.4 Heuristics Organization, Order Impact & Contributions 68 Chapter 4 Testing and Experimentation ............................................................................................................... 70 4.1 Data Sets Description 70 4.2 BitTorrent Traffic Detection 74 4.2.1 BitTorrent Traffic False Positives and False Negatives ............................................................................... 76 4.2.2 BitTorrent Traffic Accuracy Definition ....................................................................................................... 80 4.2.3 BitTorrent Byte-Wise Classification and Results ........................................................................................ 81 4.2.3.1 Data Set 1 ............................................................................................................................................ 82 4.2.3.2 Data Set 2 ............................................................................................................................................ 84 4.2.3.3 Data Set 3 ............................................................................................................................................ 86 4.2.3.4 Data Set 4 ............................................................................................................................................ 87 4.2.3.5 Data Set 5 ............................................................................................................................................ 90 4.3 BitTorrent Traffic Detection Summary and Limitations 92 4.4 Skype Traffic Detection 93 4.4.1 Skype Flow Evaluation Metrics .................................................................................................................