Application Log Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Masarykova univerzita Fakulta}w¡¢£¤¥¦§¨ informatiky !"#$%&'()+,-./012345<yA| Application Log Analysis Master’s thesis Júlia Murínová Brno, 2015 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Júlia Murínová Advisor: doc. RNDr. Vlastislav Dohnal, Ph.D. iii Acknowledgement I would like to express my gratitude to doc. RNDr. Vlastislav Dohnal, Ph.D. for his guidance and help during work on this thesis. Furthermore I would like to thank my parents, friends and family for their continuous support. My thanks also belongs to my boyfriend for all his assistance and help. v Abstract The goal of this thesis is to introduce the log analysis area in general, compare available systems for web log analysis, choose an appropriate solution for sample data and implement the proposed solution. Thesis contains overview of monitoring and log analysis, specifics of application log analysis and log file formats definitions. Various available systems for log analysis both proprietary and open-source are compared and categorized with overview comparison tables of supported functionality. Based on the comparison and requirements analysis appropriate solution for sample data is chosen. The ELK stack (Elasticsearch, Logstash and Kibana) and ElastAlert framework are deployed and configured for analysis of sample application log data. Logstash configuration is adjusted for collecting, parsing and processing sample data input supporting reading from file as well as online socket logs collection. Additional information for anomaly detection is computed and added to log records in Logstash processing. Elasticsearch is deployed as indexing and storage system for sample logs. Various Kibana dashboards for overall statistics, metrics and anomaly detection dashboards are created and provided. ElastAlert rules are set for real-time alerting based on sudden changes in events monitoring. System supports two types of input – server logs and client logs that can be reviewed in the same UI. vii Keywords log analysis, threat detection, application log, machine learning, knowledge discovery, anomaly detection, real-time monitoring, web analytics, log file format, Elasticsearch, Kibana, Logstash, ElastAlert, dashboarding, alerting ix Contents 1 Introduction .................................1 2 Monitoring & Data analysis .......................3 2.1 Monitoring in IT ............................3 2.2 Online service/application monitoring .................5 2.3 Data analysis ..............................5 2.3.1 Big data analysis . .5 2.3.2 Data science . .5 2.3.3 Data analysis in statistics . .7 2.4 Data mining ...............................7 2.5 Machine learning ............................8 2.6 Business intelligence ..........................9 3 Log analysis ................................. 11 3.1 Web log analysis ............................ 11 3.2 Analytic tests .............................. 12 3.3 Data anomaly detection ........................ 12 3.4 Security domain ............................. 13 3.5 Software application troubleshooting ................. 15 3.6 Log file contents ............................. 17 3.6.1 Basic types of log files . 18 3.6.2 Common Log File contents . 19 3.6.3 Log4j files contents . 20 3.7 Analysis of log files contents ...................... 23 4 Comparison of systems for log analysis ................ 25 4.1 Comparison measures .......................... 25 4.1.1 Tracking method . 25 4.1.2 Data processing location . 27 4.2 Client-side information processing software .............. 28 4.3 Web server log analysis ......................... 30 4.4 Custom application log analysis .................... 31 4.5 Software supporting multiple log files types analysis with advanced functionality ............................... 33 4.6 Custom log file analysis using multiple software solutions integration 36 5 Requirements analysis .......................... 39 5.1 Task description ............................. 39 5.2 Requirements and their analysis .................... 39 5.3 System selection ............................. 40 5.4 Proposed solution ............................ 41 5.5 Deployment ............................... 42 xi 6 Application log data ............................ 43 6.1 Server log file .............................. 43 6.2 Client log file .............................. 47 6.3 Data contents issues .......................... 48 7 Logstash configuration .......................... 51 7.1 Input ................................... 51 7.1.1 File input . 51 7.1.2 Multiline . 52 7.1.3 Socket based input collection . 53 7.2 Filter ................................... 54 7.2.1 Filter plugins used in configuration . 54 7.2.2 Additional computed fields . 58 7.2.3 Adjusting and adding fields . 59 7.2.4 Other Logstash filters . 61 7.3 Output .................................. 61 7.3.1 Elasticsearch output . 61 7.3.2 File output . 62 7.3.3 Email output . 62 7.4 Running Logstash ............................ 64 8 Elasticsearch ................................ 65 8.1 Query syntax .............................. 65 8.2 Mapping ................................. 66 8.3 Accessing Elasticsearch ......................... 67 9 Kibana configuration ........................... 69 9.1 General dashboard ........................... 69 9.2 Anomaly dashboard ........................... 75 9.3 Client dashboard ............................ 79 9.4 Encountered issues and summary ................... 80 10 ElastAlert .................................. 83 10.1 Types of alert rules ........................... 83 10.2 Created alert rules ........................... 84 11 Conclusion .................................. 87 11.1 Future work ............................... 88 11.1.1 Nested queries . 88 11.1.2 Alignment of client/server logs . 89 12 Appendix 1: Electronic version ..................... 91 13 Appendix 2: User Guide ......................... 93 13.1 Discover tab ............................... 94 13.2 Settings tab ............................... 97 13.3 Dashboard tab ............................. 97 13.4 Visualization tab ............................ 98 xii 14 Appendix 3: Installation and setup .................. 99 14.1 Logstash setup ............................. 99 14.2 Elasticsearch setup ........................... 99 14.3 Kibana setup .............................. 100 14.4 ElastAlert setup ............................. 100 15 Appendix 4: List of compared log analysis software ........ 101 16 Literature .................................. 107 xiii 1 Introduction Millions of online accesses and transactions per day create great amounts of data that are a significant source of valuable information. Analysis of these high amounts of data needs appropriate and sophisticated methods to process them promptly, efficiently and precisely. Data logging is an important asset in web application monitoring and reporting as it contains massive amounts of data about the application behavior. Analysis of logged data can be a great help with reporting of malicious use, intruders detection, compliance assurance and the anomalies that might lead to actual damage. In my master’s thesis I will be looking into the main benefits of monitoring, web application service log analysis and log records processing. I will be comparing a number of available systems for log records collecting and processing, considering both the existing commercial and open-source solutions. With regards to the sample data collected from a chosen web application, the most fitting solution will be chosen and proposed for the required data processing. This solution will then be implemented, deployed and tested on the sample application log records. Goals of this thesis are: • Get familiar with the terms of monitoring, data mining and the log records analysis; • Investigate possibilities and benefits of log records data collecting and analysis; • Look into different types of log formats and information they contain; • Compare and categorize commercial and open-source systems available for log analysis; • Propose an appropriate solution for the sample log records analysis based on previous comparison and requisites; • Implement the proposed solution, deploy and test on the sample data; • Summarize the results of the implementation and list possible future improvements. 1 2 Monitoring & Data analysis Monitoring1 as a verb means: To watch and check a situation carefully for a period of time in order to discover something about it. The fundamental challenge in IT monitoring process is to adapt quickly to continuous changes and make sure that the cost-effective and appropriate software tools are used. Strength of controlling process is based on both preventive and detective controls which also are the crucial parts of changes monitoring. There might be some bottlenecks in regards to different types of data that need to be monitored as not all types of monitoring systems allow records logging. Also the automated data logging processes might not be cost-effective due to slowing down the processing of data itself. Basically the strategies for automated monitoring include IT-inherent, IT-configurable, IT-dependent manual or manual guidelines and these need to be evaluated carefully considering the requisites and available resources. [1] 2.1 Monitoring in IT For Information technologies in particular, there are a few types of