Deep Packet: a Novel Approach for Encrypted Traffic Classification
Total Page:16
File Type:pdf, Size:1020Kb
Soft Computing manuscript No. (will be inserted by the editor) Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning Mohammad Lotfollahi · Mahdi Jafari Siavoshani · Ramin Shirali Hossein Zade · Mohammdsadegh Saberian Received: date / Accepted: date Abstract Internet traffic classification has become more Keywords Application Identification · Traffic important with rapid growth of current Internet net- Characterization · Deep Learning · Convolutional work and online applications. There have been numerous Neural Networks · Stacked Autoencoder · Deep Packet. studies on this topic which have led to many different approaches. Most of these approaches use predefined features extracted by an expert in order to classify net- 1 Introduction work traffic. In contrast, in this study, we propose a deep learning based approach which integrates both fea- Network traffic classification is an important task in ture extraction and classification phases into one system. modern communication networks [5]. Due to the rapid Our proposed scheme, called \Deep Packet," can han- growth of high throughput traffic demands, to properly dle both traffic characterization in which the network manage network resources, it is vital to recognize dif- traffic is categorized into major classes (e.g., FTP and ferent types of applications utilizing network resources. P2P) and application identification in which end-user Consequently, accurate traffic classification has become applications (e.g., BitTorrent and Skype) identification one of the prerequisites for advanced network manage- is desired. Contrary to most of the current methods, ment tasks such as providing appropriate Quality-of- Deep Packet can identify encrypted traffic and also dis- Service (QoS), anomaly detection, pricing, etc. Traffic tinguishes between VPN and non-VPN network traffic. classification has attracted a lot of interests in both After an initial pre-processing phase on data, packets academia and industrial activities related to network are fed into Deep Packet framework that embeds stacked management (e.g., see [13], [15], [42] and the references autoencoder and convolution neural network in order therein). to classify network traffic. Deep packet with CNN as its As an example of the importance of network traffic classification model achieved recall of 0:98 in application classification, one can think of the asymmetric archi- identification task and 0:94 in traffic categorization task. tecture of today's network access links, which has been To the best of our knowledge, Deep Packet outperforms arXiv:1709.02656v3 [cs.LG] 4 Jul 2018 designed based on the assumption that clients download all of the proposed classification methods on UNB ISCX more than what they upload. However, the pervasiveness VPN-nonVPN dataset. of symmetric-demand applications (such as peer-to-peer (P2P) applications, voice over IP (VoIP) and video call) Mohammad Lotfollahi has changed the clients' demands to deviate from the Sharif University of Technology, Teharn, Iran assumption mentioned earlier. Thus, to provide a satis- Mahdi Jafari Siavoshani factory experience for the clients, other application-level Sharif University of Technology, Teharn, Iran knowledge is required to allocate adequate resources to E-mail: [email protected] such applications. Ramin Shirali Hossein Zade The emergence of new applications as well as in- Sharif University of Technology, Teharn, Iran teractions between various components on the Internet Mohammdsadegh Saberian has dramatically increased the complexity and diversity Sharif University of Technology, Teharn, Iran of this network which makes the traffic classification a 2 Mohammad Lotfollahi et al. difficult problem per se. In the following, we discuss in The rest of paper is organized as follows. In Section 2, details some of the most critical challenges of network we review some of the most important and recent stud- traffic classification. ies on traffic classification. In Section 3, we present the First, the increasing demand for user's privacy and essential background on deep learning which is necessary data encryption has tremendously raised the amount to our work. Section 4 presents our proposed method, of encrypted traffic in today's Internet [42]. Encryption i.e., Deep Packet. The results of the proposed scheme procedure turns the original data into a pseudo-random- on network application identification and traffic charac- like format to make it hard to decrypt. In return, it terization tasks are described in Section 5. In Section 6, causes the encrypted data scarcely contain any discrim- we provide further discussion on experimental results. inative patterns to identify network traffic. Therefore, Finally, we conclude the paper in Section 7. accurate classification of encrypted traffic has become a challenge in modern networks [13]. 2 Related Works It is also worth mentioning that many of the pro- posed network traffic classification approaches, such In this section, we provide an overview of the most as payload inspection as well as machine learning and important network traffic classification methods. In par- statistical methods, require patterns or features to be ticular, we can categorize these approaches into three extracted by experts. This process is prone to error, main categories as follows: (I) port-based, (II) payload time-consuming and costly. inspection, and (III) statistical and machine learning. Finally, many of the Internet service providers (ISPs) Here is a brief review of the most important and re- block P2P file sharing applications because of their cent studies regarding each of the approaches mentioned high bandwidth consumption and copyright issues [27]. above. These applications use protocol embedding techniques Port-based approach: Traffic classification via to bypass traffic control systems [3]. The identification port number is the oldest and the most well-known of this kind of applications is one of the most challenging method for this task [13]. Port-based classifiers use the task in network traffic classification. information in the TCP/UDP headers of the packets to There have been abundant studies on the network extract the port number which is assumed to be associ- traffic classification subject, e.g., [22, 32, 16, 30]. How- ated with a particular application. After the extraction ever, most of them have focused on classifying a proto- of the port number, it is compared with the assigned col family, also known as traffic characterization (e.g., IANA TCP/UDP port numbers for traffic classification. streaming, chat, P2P, etc.), instead of identifying a single The extraction is an easy procedure, and port numbers application, which is known as application identification will not be affected by encryption schemes. Because of (e.g., Spotify, Hangouts, BitTorrent, etc.) [20]. In con- the fast extraction process, this method is often used in trast, this work proposes a method, i.e., Deep Packet, firewall and access control list (ACL) [34]. Port-based based on the ideas recently developed in the machine classification is known to be among the simplest and learning community, namely, deep learning, [6, 23], to fastest method for network traffic identification. How- both characterize and identify the network traffic. The ever, the pervasiveness of port obfuscation, network benefits of our proposed method which make it superior address translation (NAT), port forwarding, protocol to other classification schemes are stated as follows: embedding and random ports assignments have signifi- cantly reduced the accuracy of this approach. According { In Deep Packet, there is no need for an expert to to [30, 28] only 30% to 70% of the current Internet traffic extract features related to network traffic. In the light can be classified using port-based classification methods. of this approach, the cumbersome step of finding and For these reasons, more complex traffic classification extracting distinguishing features has been omitted. methods are needed to classify modern network traffic. { Deep Packet can identify traffic at both granular Payload Inspection Techniques: These techniques levels (application identification and traffic charac- are based on the analysis of information available in terization) with state-of-the-art results compared to the application layer payload of packets [20]. Most of the other works conducted on similar dataset [16, 47]. the payload inspection methods, also known as deep { Deep Packet can accurately classify one of the hard- packet inspection (DPI), use predefined patterns like est class of applications, known to be P2P [20]. This regular expressions as signatures for each protocol (e.g., kind of applications routinely uses advanced port see [48, 36]). The derived patterns are then used to obfuscation techniques, embedding their information distinguish protocols form each other. The need for up- in well-known protocols' packets and using random dating patterns whenever a new protocol is released, ports to circumvent ISPs' controlling processes. and user privacy issues are among the most important Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning 3 drawbacks of this approach. Sherry et al. proposed a prone to human mistakes. Moreover, note that for the new DPI system that can inspect encrypted payload case of using k-NN classifiers, as suggested by [47], it without decryption, thus solved the user privacy issue, is known that, when used for prediction, the execution but it can only process HTTP Secure (HTTPS) traffic time of this algorithms is a major concern. [37]. To the best of