Network Security Monitoring and Analysis Based on Big Data Technologies

University of Nevada, Reno Network Security Monitoring and Analysis based on Big Data Technologies A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Engineering by Bingdong Li Dr. Mehmet Hadi Gunes / Dissertation Co-Advisor Dr. George Bebis / Dissertation Co-Advisor December, 2013 c Copyright by Bingdong Li 2013 All Rights Reserved THE GRADUATE SCHOOL We recommend that the dissertation prepared under our supervision by BINGDONG LI entitled Network Security Monitoring and Analysis based on Bigig DataData TechnologiesTechnologies be accepted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Mehmet Hadi Gunes , Ph.D. , Co-Advisor George Bebis , Ph.D., Co -Advisor Murat Yuksel , Ph.D. ,, CommitteeCommittee MemberMember Minggen Lu , Ph.D. ,, CommitteeCommittee MemberMember Dong Yu, Ph.D. ,, CommitteeCommittee MemberMember Yantao Shen , Ph.D. ,, GraduateGraduate SchoolSchool RepresentativeRepresentative Marsha H. Read, Ph. D., Dean, Graduate School December, 2013 i Abstract Network flow data provide valuable information to understand the network state and to be aware of the network security threats. However, processing the large amount of data collected from the network and providing real time information remain as big challenges. Big data technologies provide new approaches to collect, store, measure, and analyze the large amount of data. This dissertation aims to provide a system of network security monitoring and analysis based on the big data technologies. First, I present an extensive survey of the network flow applications that covers past research perspectives, methodologies, and a discussion of challenges and future works. Then, I present system design of the network security monitoring and analysis platform based on the Big Data technologies. Components of this system include Flume and Kafka for real time distributed data collection; Storm for real time streaming distributed data processing; Cassandra for NoSQL data storage, data processing, and user interfaces. The system supports real time continuous network monitoring, interactive visualization, network measurement, and modeling to classify host roles based on host behaviors and to identify a particular user among the other users. It is critical to continuously monitor the network status and network security threats in real time, but it is a challenge to process the large amount of data in real time. I demonstrate how the big data security system designed in this dissertation supports such features. Another usage of the network flow data is to measure the contents of the network. I demonstrate how this big data system provides understanding of the usage of anonymity ii technologies on the campus Internet. Then, I present methods and the results of classification and identification of network objects based on the big data system designed in this dissertation. Finally, I use Decision Tree and Support Vector Machine to model host role behaviors and user behaviors. Sample results indicate very high accuracy of host role classification and user identification. iii Dedication This work is dedicated to my family; to my parent who loved me unconditionally but I could not be with them at their last minutes which I regret the most in my life, and to my wife Zheng Chen and my son Daniel who sweet my heart day and night. iv Acknowledgements I would like to take this opportunity to thank my advisors Dr. Mehmet Gunes and Dr. George Bebis for their advices and encouragements. This work would not have been successful without their guidance. Special thanks to my manager Jeff Springer, who supported me in many ways; encouraging me pursue PhD degree and providing the research environment. This work would not have been possible without his support. Thanks to Dr. Murat Yuskel, Dr. Yantao Shen, Dr. Minggen Lu, and Dr. Dong Yu for accepting to serve on my dissertation committee. I would also like to thank all people who have inspired me getting this step. Finally, thanks to my family whom my flesh and soul depend on. My parent taught me the fundamental and most important things in my life; to be a strong, honest, and independent person, to work hard, and to keep tackling difficult things. My wife has been taking care of our home so that I can spend time on my research. My son has been such a great kid and always reminds me to rest at the right time. During my Ph.D. study, I have went through up and down. There were many times I thought to give up. It is my faith that led me to the final line. v Table of Contents Abstract ......................................... i Dedication ....................................... iii Acknowledgements .................................. iv Table of Contents ................................... v List of Figures ..................................... viii List of Tables ...................................... x Chapter 1 Introduction ................................ 1 1.1 Motivations.................................. 2 1.2 Objectives .................................. 4 1.3 Contributions................................. 4 Chapter 2 Background ................................ 6 2.1 NetworkFlow ................................ 6 2.1.1 NetFlow............................... 7 2.1.2 sFlow ................................ 8 2.1.3 IPFIX ................................ 9 2.1.4 Network Flow Analysis . 10 2.2 BigDataandRelatedTechnologies . 10 2.2.1 FileSystem ............................. 12 2.2.2 Distributed, Parallel and Concurrent Computing . 12 2.2.3 DataCollection ........................... 14 2.2.4 DataStorage ............................ 16 2.3 MachineLearning .............................. 16 2.4 WebTechnologies .............................. 18 2.4.1 AJAX ................................ 18 2.4.2 HTML5............................... 19 vi 2.4.3 Nodejs................................ 19 2.4.4 Data Visualization on the Web . 19 Chapter 3 A Survey of Network Flow Applications ................ 21 3.1 Perspectives ................................. 21 3.1.1 Network Monitoring, Measurement and Analysis . 22 3.1.2 Network Application Classification . 25 3.1.3 User Identity Inferring . 27 3.1.4 Security Awareness and Intrusion Detection . 27 3.1.5 IssuesofDataError. 30 3.2 Methodologies ................................ 32 3.2.1 Statistics............................... 32 3.2.2 MachineLearning.......................... 33 3.2.3 Profiling............................... 36 3.2.4 Behavior-based Approaches . 38 3.2.5 Visualization ............................ 40 3.2.6 Anonymization . 40 3.2.7 AnalysisSystems .......................... 42 3.3 Discussion .................................. 45 3.3.1 Datasets............................... 45 3.3.2 ResearchPerspectives. 45 3.3.3 Methodologies ........................... 46 3.3.4 Challenges ............................. 46 3.3.5 Future Directions . 47 Chapter 4 The Big Data Security System Design .................. 49 4.1 Approach................................... 49 4.2 Components of the Security Analysis System . 50 4.2.1 DataCollection ........................... 52 4.2.2 DataStorage ............................ 52 vii 4.2.3 SecurityGateway .......................... 53 4.2.4 DataProcessing........................... 54 4.2.5 UserInterfaces ........................... 54 4.3 Features.................................... 54 4.3.1 Real Time Continuous Network Security Monitoring and Interac- tiveVisualization . 54 4.3.2 NetworkMeasurement . 55 4.3.3 Advanced Network Modeling . 55 4.4 Discussion .................................. 55 Chapter 5 Real Time Continuous Network Monitoring and Interactive Visualization ...................... 57 5.1 Real Time Network Host Querying . 58 5.2 Real Time Continuous Network Monitoring . 59 5.2.1 Network Flow Status . 60 5.2.2 Top N Conversations ........................ 61 5.3 Interactive Network Security Awareness Visualization ........... 61 5.4 Discussion .................................. 63 Chapter 6 A Case Study of Network Flow Measurement ............. 64 6.1 Usage of Anonymity Network . 64 6.1.1 Campus Network Traffic Flows . 65 6.2 RelatedWork................................. 67 Chapter 7 Classification and Identification of Network Objects ............................. 71 7.1 Methods ................................... 73 7.1.1 DataSet............................... 73 7.1.2 Algorithm.............................. 73 7.1.3 Modelling.............................. 74 7.1.4 GroundTruth ............................ 74 7.2 HostsRoleClassification. 74 viii 7.2.1 ClassificationFeatures . 75 7.2.2 Classification of Client versus Server . 77 7.2.3 Classification of Web Email Server versus Web Non-email Server 78 7.2.4 Classification of Hosts from Personal Office versus Public Place . 79 7.2.5 Classification of Hosts from Two Different Colleges . 80 7.2.6 Feature Contributions . 81 7.3 UserIdentification .............................. 83 7.3.1 IdentificationFeatures . 84 7.3.2 User Identification Results . 86 7.3.3 Feature Contributions . 87 7.4 Discussion .................................. 89 7.4.1 Classification of Network Applications . 89 7.4.2 Profiling the Host and Network . 90 7.4.3 Machine Learning Approaches . 90 7.4.4 Classifying and Clustering Host Roles . 91 7.4.5 Identifying the User among Others based on Network Flow.... 92 Chapter 8 Conclusion ................................ 93 Chapter 9 Future Work ............................... 95 9.1 Improvement to the Current Work . 95 9.2 Extensions

Network Security Monitoring and Analysis Based on Big Data Technologies

A Survey of Network Performance Monitoring Tools

Best Practices for Network Monitoring

Large Scale Monitoring of Home Routers

Network Monitoring Using Nagios and Autoconfiguration for Cyber Defense Competitions

Deploying F5 with Nagios Open Source Network Monitoring System Welcome to the F5 and Nagios Deployment Guide

Network Management

Comparative Analysis of Two Open Source Network Monitoring Systems: Nagios & Opennms

Monitoring the I2P Network Juan Pablo Timpanaro, Chrisment Isabelle, Festor Olivier

DMP NETWORK MONITORING™ OPTIONS Application Note

Multi-Layer Network Monitoring and Analysis

Comparison of Network Monitoring Systems from Wikipedia, the Free

Automated System Monitoring