Application-Aware Flow Monitoring

Masaryk University Faculty of Informatics Application-Aware Flow Monitoring Doctoral Thesis Petr Velan Brno, Spring 2018 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Petr Velan Advisor: doc. Ing. Pavel Čeleda, Ph.D. i Acknowledgement I would like to thank my advisor, colleagues, and fellow researchers for their input and insights that helped to shape my research and this thesis. Furthermore, I could have hardly finished this work without the relentless support of my friends and family. They have my eternal gratitude. iii Abstract The Internet has become a crucial medium for business, entertainment, communication, learn- ing, access to information, and other human endeavours. As the society becomes increasingly dependent upon the availability and security of online services, the revenue from disrupting the normal operation of these services increases as well. Gaining unauthorised access, disrupting functionality, stealing and selling confidential data, and impersonation are only a few examples of these disruptions. Various systems, such as firewalls and anti-virus software, are intended to thwart these attacks and reduce the associated risks. To protect users on their networks, admin- istrators often deploy network monitoring systems to detect and mitigate the threats. Network flow monitoring is an instrument which allows observing traffic passing through given point in the network and gathering aggregated information about the observed network connections. The aggregated nature of provided information allows to scale the flow monitoring to high-speed networks and monitor traffic rates up to and including hundreds of gigabits per second. Flow data is mainly used for security purposes. However, due to its versatility, it is often used for network management tasks, such as capacity planning and accounting, as well. As both network communication protocols and attack vectors are becoming increasingly sophisticated, the flow monitoring needs to evolve as well. The current trends for flow monitoring arethe extraction of supplementary information from application protocols, identification of encrypted traffic, and monitoring of high-speed networks. This thesis contributes to the progression of flow monitoring by exploring the possibilities unlocked by extending the flow data with application-specific information. We show howthe construction of flows is affected by the addition, present the benefits to traffic analysis andassess the inevitable performance loss. To compensate for the lost performance several novel optimisa- tion techniques are proposed for the flow monitoring process. Recognizing that the increasing deployment of encryption is going to limit the benefits of application flow monitoring, we per- form a survey of methods for measurement of encrypted traffic. The thesis is concluded byan outlook towards future possibilities for flow monitoring advancement. The first contribution of this thesis is a revised definition of flow that attempts toimproveand update currently used definitions so that it better matches current flow monitoring practices. A formal definition of flow, together with algorithms for flow construction based on this definition is provided as well. We demonstrate that the revised flow definition allows for use cases that were not covered by NetFlow v9 and IPFIX definitions, such as monitoring of traffic containing fragmented packets. Using the revised terminology and definitions, we focus on application flow monitoring, which is one of the current trends in flow monitoring. Firstly, an overview of the state ofap- plication flow monitoring is provided. Secondly, practical definitions of application flowand application flow record are given to circumscribe application flow monitoring. Thirdly, anex- perimental study on the design of an HTTP application protocol parser is presented. The study quantifies how application flow monitoring increases the demand for computational resources and decreases the performance of the flow monitoring system. The application flow monitoring provides new data for traffic analysis. We study several use cases for which the application flow monitoring can be applied in detail. The first oneuses information from HTTP headers to detect new classes of attacks on the application layer. The second one shows how the additional information can be used to analyse utilisation of IPv6 transition mechanisms. Then, we show that adding geolocation information to flow records can be used for advanced traffic analysis. And last, a method for characterising network trafficis described. It allows comparing multiple network traces and searching for similarities. The monitoring of high-speed networks is one of the current trends in flow monitoring. Firstly, we describe the state-of-the-art in this area. Secondly, we identify and propose multi- iv ple optimisations including those that rely on programmable network interface cards. Some of these optimisations are then used to build and demonstrate a high-density flow monitoring system capable of processing sixteen 10 Gb links in a single box. Classification of encrypted traffic and identification of applications using encryption has become a widely researched topic in the last decade. A survey of methods for measurement of encrypted traffic is, therefore, one of the contributions of this thesis. We show that surprisingly detailed information can be obtained using these methods. In specific cases, even the content of the encrypted connection can be established. This work concludes with a vision for the future of the flow monitoring. We identify several directions of future research of flow monitoring and a novel approach to monitoring of tunnelled traffic and application layer is proposed. We believe that the perception of thewhole application flow monitoring should be revised to facilitate future demands for more complete and better-structured data. v Keywords network, monitoring, measurement, flow, application flow, NetFlow, IPFIX, encryption, performance, 100 Gbps vi Contents 1 Introduction 1 1.1 Problem Statement . .1 1.1.1 Application Layer Information . .2 1.1.2 Growing Network Speeds . .2 1.1.3 Traffic Encryption . .2 1.2 Research Goals . .3 1.3 Contributions . .3 1.4 Thesis Structure . .4 2 Network Flow Monitoring 5 2.1 Flow Monitoring Basics . .6 2.1.1 History of Flow Monitoring . .6 2.1.2 Related Technologies . .9 2.2 Flow Definition . 10 2.3 Flow Monitoring Architecture . 14 2.3.1 Terminology . 15 2.3.2 Flow Monitoring Deployment . 17 2.4 Flow Monitoring Process . 20 2.4.1 Packet Capture . 20 2.4.2 Packet Processing . 22 2.4.3 Flow Creation . 22 2.4.4 Flow Export . 25 2.5 Flow Data Processing . 26 2.5.1 Flow Collection . 26 2.5.2 Flow Storage . 29 2.5.3 Flow Processing . 30 2.6 Common Issues . 31 2.6.1 Visible Data Loss . 31 2.6.2 Unobserved Data Loss . 31 2.6.3 Other Issues . 33 2.7 Summary . 34 3 Application Flow Monitoring 35 3.1 Motivation . 36 3.2 Related Work . 36 3.2.1 Application Parsers . 36 3.2.2 Application Flow Exporters . 37 3.3 Application Flow Definition . 38 3.4 Creating Application Flow . 40 3.4.1 Packet Processing . 40 3.4.2 Flow Creation . 41 3.4.3 Flow Export . 42 3.5 Design of an HTTP Parser: A Study . 43 3.5.1 Related Work . 43 3.5.2 Parser Design . 44 3.5.3 Evaluation Methodology . 45 3.5.4 Parser Evaluation . 47 vii 3.5.5 Conclusions . 50 3.6 Summary . 51 4 Traffic analysis using Application Flow Monitoring 53 4.1 Security Monitoring of HTTP Traffic Using Extended Flows . 55 4.1.1 Related Work . 56 4.1.2 Measurement Tools and Environment . 56 4.1.3 Results . 57 4.1.4 Discussion . 63 4.1.5 Conclusion . 66 4.2 An Investigation Into Teredo and 6to4 Transition Mechanisms: Traffic Analysis . 66 4.2.1 Related Work . 67 4.2.2 Investigated IPv6 Transition Mechanisms . 67 4.2.3 Methodology and Measurement Setup . 68 4.2.4 Characteristics of IPv4 Tunnel Traffic . 70 4.2.5 Duration and Size of Flows . 73 4.2.6 IPv6 Tunneled Traffic Analysis . 74 4.2.7 Evaluation of IPv6 Adoption . 76 4.2.8 Conclusion . 78 4.3 Large-Scale Geolocation for NetFlow . 78 4.3.1 Related Work . 79 4.3.2 Exporter-Based Geolocation . 79 4.3.3 Collector-Based Geolocation . 80 4.3.4 Prototype Deployment . 81 4.3.5 Use Cases . 82 4.3.6 Conclusions . 86 4.4 Network Traffic Characterisation Using Flow-Based Statistics . 86 4.4.1 Related Work . 87 4.4.2 Methodology . 88 4.4.3 Results . 90 4.4.4 Conclusions . 95 4.5 Summary . 96 5 Flow Monitoring Performance 99 5.1 Measuring Flow Monitoring Performance . 100 5.1.1 Measuring Overall Throughput . ..

Load more