______PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION

Network Anomaly Detection 8sing Artificial Neural Networks

Sergey Andropov, Alexei Guirik Mikhail Budko, Marina Budko ITMO University ITMO University Saint-Petersburg, Russia Saint-Petersburg, Russia [email protected], [email protected] [email protected], [email protected] Line 4 - Author1, [email protected] Abstract—This paper presents a method of identifying and • semantic gap between results and their operational classifying network anomalies using an artificial neural network interpretation; for analyzing data gathered via Netflow protocol. Potential anomalies and their properties are described. We propose using a • high variance in network’s behavior; multilayer , trained with the algorithm. We experiment both with datasets acquired from a • difficulties in evaluating and verifying results. real ISP monitoring system and with datasets modified to Dealing with these problems is necessary to construct an simulate the presence of anomalies; some Netflow records are effective anomaly detection system. One part of the solution is modified to contain known patterns of several network attacks. setting a narrower focus for the analysis program, such as We evaluate the viability of the approach by practical limiting the scope of the network that is being monitored. experimentation with various anomalies and iteration sizes. Any network will change over time, as the new hardware I. INTRODUCTION gets installed, the new programs with different behavior Information security of modern computer networks is a patterns emerge, etc. Usage of a neural network allows the growing source of concerns. The need for tools capable of system to adapt to the gradual changes as it can adjust its detecting the ever-increasing number of network attacks, weights to accommodate for the shifts in the behavior. viruses and other incidents continues to grow. There are Another problem of automated anomaly detection is the methods of anomaly detection based on known signatures and difficulty of obtaining proper training data and filtering the metrics; while giving acceptable results and having a low false network noise. Usage of Netflow protocol allows countering alarm rates, they are normally unable to detect a previously some of these difficulties, as it provides vital information unknown attack or vectors of a virus spreading, which means without overloading the system with too much data and thus such programs have to rely on databases being updated on a has relatively small storage requirements [10]. regular basis. The rest of the article is structured as follows: section II Anomaly can be defined as a deviation from the normal contains the overall design of the detection method; section III behavior. An anomaly can be defined as "an event (or object) describes network anomalies and their typical properties; that differs from some standard or reference event, in excess of sections IV and V contain Netflow packet structure and the some threshold, in accordance with some similarity or distance aggregation criteria; section VI shows the design and the metric on the event" [1]. Major sources of network anomalies training process of the neural network; sections VII presents are network attacks, hardware or software malfunctions and malware. Anomalies should be considered dangerous, as the dataset acquisition methods and evaluation of the detection breaking or disabling one component of the network can lead algorithm; finally, conclusions are drawn in section VIII. to the entire network being compromised. Detecting – and II. DESIGN classifying – an anomaly is an important step in preventing, or at least reducing, the potential damage caused by its The detection system has the following capacities: occurrence. 1) Offline traffic analysis. With this type of analysis a This article presents a method of detecting and classifying model of normal behavior for the network can be created. It an anomaly using an artificial neural network that analyses also provides information about different anomalies. data received with the Netflow protocol. Implementation of offline analysis requires a dataset of normal network behavior for a certain period. The topic of using for intrusion and anomaly detection is a well researched one [2], [3]. As stated 2) Online traffic analysis. This is the primary mode for in [2], using neural networks and other means of adaptable the system. Data from the Netflow protocol is received, systems for network monitoring presents a set of challenges, processed and stored by the system; different aspects of it are such as: analyzed by an artificial neural network in real time. If an anomaly is detected, a warning will be issued along with a • high cost of errors; report containing information about the incident. • difficulties of obtaining for specific kinds Data aggregation allows the system to see patterns in the of anomalies/attacks; otherwise highly variable traffic properties. Netflow protocol

ISSN 2305-7254 ______PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION

is used to receive information from the network devices, which is then filtered and aggregated based on a number of criteria such as number of packets per hour, average packet size, port usage. This data serves as an input to a neural network, which consists of three layers: input layer, hidden layer and output layer. The outputs of the neural network show if an anomaly is present. For every known anomaly the system is designed to detect, there is a neuron that shows the probability of that anomaly occurring. There is an additional neuron for an “unknown” anomaly, as well as one for a “normal” behavior. We discuss further details on the internal design of the neural network in section VI.

III. NETWORK ANOMALIES Network anomaly is a situation when the network behaves differently from its established pattern. There are numerous reasons for anomalies to appear: Fig. 2. An example of network anomaly: abnormally high number of • hardware malfunctions discarded packets • unauthorized actions of the staff • intentional security breaches involving the staff • network attacks • virus infection process • "flash-crowd" behavior Anomalies will have different features depending on their source. For example, hardware malfunctions are likely to have distinct drops in incoming/outgoing traffic, as presented in Fig. 1.

Fig. 3. An example of network anomaly: GRE active

The topic of network attacks has been researched extensively [4], [5]. We will give short descriptions along with identifiable properties of each anomaly we plan to detect. • DoS and DDoS attacks Denial of service attack aims to make an online service unavailable. It is usually carried out by overwhelming the target with superfluous requests, using multiple sources in case of a Distributed DoS. This type of attack is regarded as one of Fig. 1. An example of network anomaly: uplink hardware failure the most common. (D)DoS attacks often use minimal packet sizes, set an incorrect protocol number and use the same Another examples are the rising number of discarded source/destination addresses. Targeted service experiences a packets, as shown in Fig. 2, and the result of GRE protocol noticeable increase of traffic flow and specific port number being active in Fig. 3. usage. There are multiple subtypes of this attack: UDP Flood, ICMP Ping Flood, SYN Flood, NTP Amplification etc. While unauthorized actions and intentional security breaches might be detectable through traffic analysis, in this • Network scans work we primarily focus on monitoring anomalies caused by network attacks, failures of hardware and changes in network Scanning attacks are performed by making multiple behavior. attempts to connect with different hosts in order to find ports

------27 ------______PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION

and addresses that are open for connections. This procedure network monitoring" [6]. It became a de-facto standard and is can be performed by network administrators to check security used widely on Cisco devices. There are similar protocols, levels, as well as by potential adversaries to find most notably sFlow and IPFIX, however comparison between vulnerabilities. Network scans can be spotted by observing an them beyond the scope of this article. increase in connections from a certain host, or from multiple hosts to one. Traffic generated by this type of attack is Netflow uses UDP/SCTP protocols to transfer data from comparatively small. routers to special collectors. According to protocol description, all packets with the same source/destination IP address, • Flash-crowd source/destination ports, protocol interface and class of service are grouped into a flow and then packets and bytes are tallied This anomaly can be mistaken for a DDoS attack. It [6]. Then these flows are bundled together and transported to happens when public’s interest towards a particular resource is the Netflow collector server. significantly increased; an example of this is a release of long- awaited program that a group of people rushes to download or Most common version of Netflow is 5 [6]. The following access. One of the ways to distinguish this from DDoS is to information is used in analysis: look at the packet size, which should be noticeably higher compared to the actual attack. • Source and destination addresses • ARP spoofing • Source and destination ports ARP spoofing involves sending falsified ARP (Address • Protocol type Resolution Protocol) messages over a local area network. In • Number of packets their most basic form, ARP spoofing attacks are used to steal sensitive information. Beyond this, ARP spoofing attacks are • Size of packets often used to facilitate other attacks like DDoS attacks, session This data is used to: hijacking (granting attackers access to private systems and data) and man-in-the-middle attacks (intercepting and • aggregate the traffic by different criteria modifying the traffic between victims). Monitoring the • determine the number of connections and transfers network data should reveal an increase in number of packets with conflicting source address information. • spot the types of attacks that use specific ports • Idle scan • determine direction of the flows and usage of protocols

Idle scan is sometimes called a "stealth scan". The main V. AGGREGATION CRITERIA difference from the network scan type of attack is the fact that The following criteria were chosen based on the the attacker does not need to send packets from his own IP information provided by the Netflow protocol: address. Attack itself is performed from a number of "zombie" machines. Even though the target's intrusion detection systems 1) Incoming/outgoing traffic volume for specific might raise the alarm, the true direction is much harder to protocols. determine. The danger of this attack lies in the fact that it can The traffic intensity changes throughout the day, week and produce a list of open ports from the "zombie" point of view; a month, as different types of users connect and disconnect from list that basically exposes the relationship between hosts the network and perform various number of activities. A within the targeted network. perfect detection system would therefore create the network’s Each of the mentioned anomalies has their own set of traits model for all listed periods of time, however it is important to limit the scope of the system to a reasonable level [3]. As by which they can be recognized. In reality, however, it is such, for the purposes of this work normal behavior model is often difficult to spot these among all the information noise assumed to be created for a period of one day. within a network, not to mention that attackers constantly find new ways of performing malicious acts. As such, a detection 2) Usage of ports. system needs to be able to learn new patterns and identify Certain viruses are known to use a specific port to perform them. malicious actions (e.g. AckCmd – port 1054, WinHole – 1081 [7]). A DDoS attack can also target a specific port. IV. NETFLOW PROTOCOL Information volume that gets transferred in even a middle- There is normally an increase in number of connections sized network is immense, and analyzing it all would be very during an attack or spreading of a virus. Analyzing the port resource consuming for a real-time intrusion detection system. usage data might reveal these situations. One of the challenges Netflow protocol provides data about that information without here is distinguishing port scanning from other anomalies, actually storing the information itself, which is perfectly suited since the former is not necessarily performed with malicious for the task. intents. This protocol was initially developed by Cisco for packet 3) Packet size. switching and nowadays "efficiently provides a key set of Viruses and attacks sometimes use specific packet sizes. As services for IP applications, including network traffic mentioned above, DDoS attack differs from "flash-crowd" accounting, usage-based network billing, network planning, behavior in packet size. It is common to use small packet size security, Denial of Service monitoring capabilities, and to increase the number of packets sent. A good example of

------28 ------______PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION

specific packet size is NTP Amplification attack, which Each neuron in the hidden and output layers has an requires a command of 234 bytes to be sent. activation function for producing non-linear output and propagating it forwards through the network. In this work we 4) Number of open connections. used a sigmoid function: A virus tries to infect as many machines as it possibly can, ͳ and it does so by opening a large number of connections to ሺ ሻ ߪ ݔ ൌ ି௫ different hosts. While Netflow protocol does not provide ͳ൅݁ information on currently opened sessions, it is still possible to where x is a sum of neuron's input xi with weights Ȧi and determine the relation between the number of connections and bias b: port numbers. ݔൌ෍߱ ݔ െܾ 6) Average traffic-to-host value. ௜௝ ௜ While this value will differ widely for various clients and Sigmoid function is bounded, easily differentiable, hosts, certain patterns are still present. Analysis on this criteria monotonic, and produces a smooth output. Small changes in can be done for a specific set of hosts in order to avoid a input coefficients (weights and bias) produce small changes in selection that is too difficult to establish a pattern on. the output of the function, thus making precise tuning possible. Its output range is [0;1], which makes it useful in classification VI. NEURAL NETWORK cases. A. General design For purposes of this work a neural network with 6 inputs, 10 neurons in the hidden layer and 7 output neurons is chosen. An artificial neural network consists of a number of Both output and hidden layer neurons use a sigmoid activation computational units called "neurons". Neurons receive inputs function. and process them, obtaining output. Different functions, known as "activation functions", can be used for this B. Training algorithm calculation, ranging from linear to sigmoid and hyperbolic tangent functions. In order to train the network, we use the backpropagation algorithm [8]. The backpropagation algorithm uses the A typical artificial neural network has three layers – input, gradient descent method to look for the minimum of the error "hidden" and output. Second, hidden, layer makes network's function. A solution is thus the combination of weights that behavior non-linear. The amount of neurons within that layer minimizes the error. and the amount of hidden layers itself can be chosen with different techniques in mind and depends on the conditions of First, the input information is presented to the network and the task, however, for tasks without especially complex propagated forward until it reaches the output layer. Then the computations one layer is usually enough [8]. The amount of desired and actual outputs are compared and the error for each neurons can be an average between input and output neurons, output neuron is calculated. This error is propagated backward but may be increased. through the network, thus giving the error for each neuron in all hidden layers. Using these values, a backpropagation The output layer has several neurons, one for each anomaly algorithm can update weights and biases. the system is going to be able to detect, plus one for an "unknown" anomaly and one for "normal" behavior. These Initial weights of the network are selected at random. neurons output values between 0 and 1, which are considered When input xi is presented to the network, it is propagated to be the probabilities of a relevant anomaly occurring. through the network, producing an output oi. The goal of the training algorithm is to make the output oi close or identical to Each connection has a weight; neurons in the hidden and the desired output ti for each input. This is done by minimizing output layers have biases. Weights and biases are adjusted in the error function: order to reduce the overall output error in both the training and ௡ deployment. ͳ ൌ ෍ሺ݋ െݐ ሻଶܧ ʹ ௜ ௜ ௜ୀଵ First, the error signal in the output layer k is calculated:

ȟ୩ ൌ–୩ െ‘୩

ߜ௞ ൌȟ୩ߙԢ௞ (1)

where ߙԢ௞ is a derivative of the activation function. For the output layer this derivative equals 1. The weights of the output layer are adjusted according to:

(ȟ߱௝௞ ൌݔ௞ߜ௞ߛ (2

where ݔ௞ is the input from a neuron in the previous layer (i.e. the output of the relative neuron in the hidden layer), ߛ is Fig. 4. A basic neural network structure the . This learning rate is typically a small number

------29 ------______PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION

(eg. 0.004), regulating the speed at which the weights are data. The full list of anomalies added to the dataset is as adjusted. Big learning rate values may cause the network’s follows: outputs to oscillate around the target, thus never converging on •DDoS UDP flood a solution; small values might cause the learning process to be very slow. • DDoS TCP flood It is worth noting that the gradient descend method has a •Port scan downside in that it might get “stuck” at the local minimum of • Idle scan the error function. In order to get over the “small hill” and continue moving towards a global minimum, we can modify • ARP spoofing the equation (2) as follows: • "custom" anomaly ௡ ௡ିଵ ȟ߱௝௞ ൌݔ௞ߜ௞ߛ൅ȟ߱௝௞ ߮ The "custom" anomaly was created to test the neural network's ability to spot an unknown class of anomalies. It where ߮ is a momentum factor. The introduction of the consists of several random port/packet usage combinations, momentum accelerates the learning process by keeping track simulating the way some viruses work. of the previous changes, thus allowing the algorithm to move in larger steps. The faster movement prevents the network Each anomaly in the dataset was labeled and fed into the from settling in a local minimum by helping it move past the neural network. Backpropagation algorithm adjusted the “hill”. weights and biases each time the outputs classified an anomaly. This process was done for 150000 and 300000 The error signal for the nodes in hidden layer is calculated iterations with the learning factor Ȗ of 0.004. in a similar way to the output layer. The outcomes for different anomalies can be seen in Table I for 150000 iterations and Table II for 300000 iterations. ߜ௝ ൌߙԢ௝ ෍߱௝௞ߜ௝

TABLE I. CLASSIFICATION RESULTS AFTER 150000 ITERATIONS where σ ߱௝௞ߜ௝ is the weighted error signal. A derivative of the activation function for the hidden layer is: Type Accuracy Quantity in dataset DDoS UDP flood 0.87 10 ߙԢ ൌ݋ ሺͳ െ ݋ ሻ ௝ ௝ ௝ DDoS TCP flood 0.79 10 therefore Port scan 0.84 10 Idle scan 0.76 8 (3) ߜ௝ ൌ݋௝ሺͳ െ ݋௝ሻ෍߱௝௞ߜ௝ ARP spoofing 0.71 8 custom 0.63 5 normal 0.95 - Weights of the hidden layer are updated in the same way as the output’s: TABLE II. CLASSIFICATION RESULTS AFTER 300000 ITERATIONS

௡ ௡ିଵ Type Accuracy Quantity in dataset ȟ߱௜௝ ൌݔ௝ߜ௝ߛ൅ȟ߱௜௝ ߮ DDoS UDP flood 0.91 10 VII. DATASET AND EVALUATION DDoS TCP flood 0.85 10 Port scan 0.96 10 Section V presented a list of aggregation criteria. Data acquired through aggregation is used as an input to the Idle scan 0.78 8 artificial neural network. However, acquiring a good labeled ARP spoofing 0.83 8 dataset is quite a challenging task. In order to correctly train custom 0.85 5 the network and not have it "overfit", the dataset should to be normal 0.96 - sufficiently large and diverse; it should include enough for the network to be able to detect the patterns As we can see, the neural network shows promising results between the inputs and the desired outputs. in detecting both known types of anomalies and new ones. It is also apparent that detection accuracy greatly depends on the For this work the data was collected from the local ISP quality of the training data and the number of learning network with several hundred L2 nodes for a period of one iterations. month. Dataset collected this way is considered to show "normal" behavior before any modifications are made. For Given the current configuration, a possible solution to testing purposes, several small-scale anomalies were created: improve the detection rates is to obtain a better represented DoS and DDoS attacks, port scans, email spamming and dataset, as well as optimize the aggregation criteria. routers turning off. Data was then modified with the Flame tool, as suggested in [9]. VIII. CONCLUSIONS Flows in the dataset were edited by adding and modifying This paper presented a way of using an artificial neural Netflow records in order to simulate suspicious activity. network as an anomaly detection and classification tool. A Patterns of known anomalies were injected into the gathered Netflow protocol was used in order to obtain and process the

------30 ------______PROCEEDING OF THE 20TH CONFERENCE OF FRUCT ASSOCIATION

data, which was then aggregated based on several properties. [4] Bhuyan, Monowar H., Dhruba Kumar Bhattacharyya, and Jugal K. The results show high percentages of successful identifications Kalita, "Network anomaly detection: methods, systems and tools." Ieee communications surveys & tutorials 16.1, pp. 303-336, 2014. after a number of iterations. [5] Hansman, Simon, and Ray Hunt. "A taxonomy of network and Further work will be aimed at reducing the false positive computer attacks." Computers & Security 24.1, pp. 31-43, 2005. [6] Cisco official site, Cisco IOS NetFlow rates and optimizing the aggregation algorithms. http://www.cisco.com/c/en/us/products/collateral/ios-nx-os- software/ios-netflow/prod_white_paper0900aecd80406232.html REFERENCES [7] Ahmad, Muhammad Aminu, Steve Woodhead, and Diane Gan, [1] C. A. Carver, J. M. D. Hill and U. W. Pooch, "Limiting Uncertainty "Early containment of fast network worm malware." Information and in Intrusion Response", 2001 IEEE Man Systems and Cybernetics Computer Science (NICS), 2016 3rd National Foundation for Science Information Assurance Workshop, pp. 142-147, New York, June and Technology Development Conference on. IEEE, 2016. 2001. [8] Demuth, Howard B., et al. Neural network design. Martin Hagan, [2] Robin Sommer, "Outside the Closed World: On Using Machine 2014. Learning For Network Intrusion Detection, SP '10 Proceedings of the [9] Cynthia Wagner, Jerome Francois, Radu State, Thomas Engel, 2010 IEEE Symposium on Security and Privacy, pp. 305-316, 2010. "Machine Learning Approach for IP-Flow Record Anomaly [3] Salima Omar, Asri Ngadi, Hamid H. Jebur, "Machine Learning Detection", IFIP Networking 2011, Valencia, Spain, May 2011. Techniques for Anomaly Detection: An Overview", International [10] Hofstede, Rick, et al. "Flow monitoring explained: From packet Journal of Computer Applications (0975 – 8887), Volume 79 – No.2, capture to with netflow and ipfix.", IEEE October 2013. Communications Surveys & Tutorials 16.4, 2014, pp. 2037-2064.

------31 ------