3 User-Agent 17 3.1 Format

Masaryk University Faculty of Informatics Detection of network attacks using HTTP related information Master’s Thesis Lenka Kuníková Brno, Spring 2017 Masaryk University Faculty of Informatics Detection of network attacks using HTTP related information Master’s Thesis Lenka Kuníková Brno, Spring 2017 This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Lenka Kuníková Advisor: RNDr. Pavel Minařík PhD. i Acknowledgement I would like to thank to my advisor RNDr. Pavel Minařík, PhD. and to Mgr. Martin Juřen for guidance and useful advices. iii Abstract This thesis deals with extended HTTP network flows and their application for the detection of various attacks and anomalies on the network. It highlights advantages of extended HTTP flows on chosen attacks, it implements and tests existing detection methods and suggests numerous improvements. Furthermore the thesis analyses in detail User-Agent request header. It describes possibilities of how this field can by used for anomaly detection and explains problems related to User-Agent analysis. iv Keywords Flow monitoring, HTTP, Anomaly detection, User-Agent, Brute-force attack, SQL injection v Contents 1 Introduction 1 2 HTTP 3 2.1 Basic concepts .........................3 2.2 URI ..............................4 2.3 Message format ........................5 2.3.1 HTTP request . .5 2.3.2 Methods . .7 2.3.3 Response message . .9 2.3.4 Status code . 10 2.4 Architectural Components of the Web ............ 11 2.4.1 Virtual hosting . 11 2.4.2 Proxy servers . 12 2.4.3 Caching . 13 2.4.4 Gateways . 14 2.4.5 Tunnels . 14 2.5 Authentication and secure HTTP ............... 14 2.6 HTTP/2 ............................ 16 3 User-Agent 17 3.1 Format ............................. 18 3.1.1 Non-browser . 18 3.1.2 Browser . 19 3.2 Compatibility and spoofing .................. 20 3.3 User-agents in MU network ................. 21 4 Network monitoring 23 4.1 Experiment setup ....................... 25 4.2 Shortcomings of monitoring method ............. 26 5 Network Scanning 27 5.1 HTTP scanning ........................ 28 5.1.1 Incoming traffic . 28 5.1.2 Outgoing traffic . 29 5.2 Directory traversal ...................... 31 5.3 Summary ........................... 32 vii 6 Brute Force Attacks 33 6.1 Targeted authentication methods ............... 34 6.2 Attacks against HTTP based authentication ......... 34 6.3 Attacks against authentication using POST method ..... 35 6.4 Attacks against authentication using GET method ...... 40 6.5 Summary ........................... 41 7 Code injection 43 7.1 SQL injection ......................... 43 7.1.1 Basic principle . 44 7.1.2 Attack types . 44 7.1.3 Detection . 45 7.2 Cross-site scripting ...................... 47 7.2.1 Detection . 48 7.3 Summary ........................... 50 8 User agent anomalies 51 8.1 Blacklists and pattern matching ............... 52 8.1.1 Known malicious UA strings . 52 8.1.2 Company policies and unwanted software . 54 8.1.3 Code injection . 55 8.2 Missing User-Agent ..................... 55 8.3 Many User-Agents from one IP address ........... 57 8.3.1 Outgoing traffic . 58 8.3.2 Incoming traffic . 59 8.4 Unusual User-Agent ..................... 61 8.5 Discrepant User-Agent .................... 63 8.6 Summary ........................... 65 9 Conclusion 67 A List of created scripts 69 Bibliography 71 viii 1 Introduction HTTP is one of the most commonly used application layer protocols. Every time a person tries to display a web page in a browser, the communication is carried out by HTTP or its secure version HTTPS. A significant part of network traffic is therefore performed viaHTTP. However, everything that is frequently used is also frequently misused and HTTP is no exception. On one hand, HTTP servers are common targets of various types of attacks, for example in order to take control over the server. On the other hand, botnets often use HTTP for the communication because it is easily hidden in the rest of the network traffic. Consequently, network administrators monitor their networks to detect and mitigate this malicious behaviour. There are two main approaches to network monitoring: deep packet inspection and monitoring of network flows. This thesis deals with the second option. Although originally flows contained only information from the 3rd and 4th layer of ISO OSI model, extended flows also support export of the fields from application layer protocols including HTTP. The aim of this thesis is to explore the possibilities of how HTTP fields can be used to detect various anomalies and attacks on the network. The thesis analyses chosen attack types that can be identified thanks to extended flows. It describes existing detection methods, tests them on data from Masaryk University network and if possible, it also suggests and implements several improvements. Furthermore, the thesis deals with HTTP User-Agent field in detail. It highlights diversity of User-Agents, explains the problems related to their analysis and outlines possibilities of how this field can be used for anomaly detection Every described method is also implemented and tested on a real traffic. The thesis begins with a theoretical chapter about HTTP protocol, followed by a section dedicated to User-Agent definition. Third chapter shortly explains flow monitoring and describes the setup used for experiments. Following chapters are dedicated to various attacks and anomalies. First of them focuses on network scanning at HTTP level, the next one describes two attacks based on code injection and possibilities of detecting them. Chapter 7 deals with brute force attacks and chapter 8 presents five distinct concepts of how to use User-Agent field for anomaly detection. 1 2 HTTP The Hypertext Transfer Protocol (HTTP) represents the base protocol for accessing the World Wide Web. It first appeared shortly after Tim Berners-Lee introduced a proposal of World Wide Web in 1989. With his team at CERN, they were responsible for creation of HTTP as well as Hypertext Markup Language (HTML)[1]. Since its first documented version, HTTP/0.9, the protocol has undergone multiple important changes, but it still remains one of the most ubiquitous application layer protocols. Despite the fact that in 2015 its newest version, HTTP/2, was pub- lished, this chapter describes the previous version – HTTP/1.1. Theory explained in the rest of the chapter is mostly based on RFC 7230 [2] defining HTTP message format and RFC 7231 [3] defining the seman- tics. 2.1 Basic concepts HTTP uses a client-server model. Protocol defines syntax and seman- tics of the messages that a client and a server exchange in order to deliver the web page the client has requested. Clients are usually rep- resented by web browsers, but they are not the only option. An HTTP client can also be an antivirus program checking for updates, or a web crawler that helps an Internet search engine to create its database. Among HTTP servers, the most commonly used are Apache or Mi- crosoft Internet Information Server. Servers store web resources. A resource can be a simple HTML file, an image or a dynamically generated content. Such objects are addressable by Uniform Resource Identifier (URI). Client initiates a connection, creates a request for an object on specified URI, and sends this request to a server. The server retrieves requested object from its storage and sends it back to the client in an HTTP response message. HTTP presumes a reliable, connection oriented transport layer protocol. Therefore, HTTP does not address problems related to missing packets or their reordering because it assumes everything was deliv- ered successfully. Normally, HTTP runs on the top of transmission control protocol(TCP) and the default port is 80, but if the number 3 2. HTTP is explicitly stated, any port can be used. The common alternative is 8080. HTTP can use both persistent and non-persistent connections. In case of non-persistent connections, each request/response pair is sent over a different TCP connection. In version 1.1 of the protocol, persistent connections are the default; multiple requests are combined into a single connection in order to reduce response delay. Another important property of HTTP is being stateless. This means that the server is not required to keep track of information about the users for the duration of multiple requests. Each request needs to be standalone and contain all important information to be satisfied [4]. 2.2 URI URI is a sequence of characters used to identify a resource. The most common type of URI is a Uniform Resource Locator (URL). It is a subset of URIs that, in addition to identifying a resource, provides the means of locating it [5]. Another option is Uniform Resource Name (URN), which does not provide a way how to locate the resource – it is location independent. URNs are not widely adopted and they will not be further discussed. The thesis only focuses on URLs and those used in HTTP satisfy the following syntax: <scheme>://<host>:<port>/<path>?<query>#<fragment> Scheme defines used protocol, which is HTTP in this case. Thehost component identifies the server hosting the resource. It can be either in the form of a hostname or an IP address. The next field determines a port the requested server is listening on. In HTTP, 80 is the default value. Path field specifies the location of the resource on the server. There is no official format for query component, but key=value pairs separated by ampersand (&) are commonly used.

3 User-Agent 17 3.1 Format

Browser Versions Carry 10.5 Bits of Identifying Information on Average [Forthcoming Blog Post]

Longitudinal Characterization of Browsing Habits

RSA Adaptive Authentication (On-Premise) 7.2 Integration Guide

Lab Exercise – HTTP

Working with User Agent Strings in Stata: the Parseuas Command

Why Is XHTML Needed? Isn't HTML Good Enough?

Economic and Technical Drivers of Technology Choice: Browsers

Information Systems Modeling

HTML, CSS, & Javascript Mobile Development for Dummies

Economic and Technical Drivers of Technology Choice: Browsers

HTTP Transactions

The Complete Guide to User Agents