CERN Web Application Detection
Total Page:16
File Type:pdf, Size:1020Kb
CERN Web Application Detection Refactoring and release as open source software by Piotr Lizończyk Supervised by Sebastian Łopieński and Dr. Stefan Lüders Summer Students Programme 2015 Geneva, 28. August 2015 1 Table of contents 1. Abstract ...........................................................................................................................3 2. Project specification ........................................................................................................4 2.1. What is Web Application Detection (WAD)? .............................................................4 2.2. Original project goals ................................................................................................4 2.3. Additional achievements ..........................................................................................4 3. Initial code assessment and refactoring ...........................................................................5 3.1. Determining project usability for public audience ....................................................5 3.2. Creating environment for code development ...........................................................5 3.3. Code refactoring .......................................................................................................5 3.4. Improving code maintenance ...................................................................................6 3.5. Ensuring compatibility with Python 3 .......................................................................7 4. Public release of Web Application Detection ...................................................................8 4.1. Splitting WAD into public and CERN-specific parts ....................................................8 4.2. Setting up continuous integration ............................................................................8 4.3. Providing license and creating readme .....................................................................8 4.4. Wrapping code as a Python package ........................................................................9 4.5. Publishing the package on Python Package Index (PyPI) ...........................................9 5. Integration with third party solutions ............................................................................10 5.1. Resignation from integration with OpenVAS ..........................................................10 5.2. Integrating WAD with w3af ....................................................................................10 5.3. Integration with Kali Linux distribution ...................................................................10 6. Adding new features .....................................................................................................11 6.1. Multiple output formats .........................................................................................11 6.2. Login to SSO-protected websites ............................................................................11 6.3. Detecting proxies ...................................................................................................11 6.4. Bugfixes after release .............................................................................................11 7. Conclusion and outlook .................................................................................................12 2 1. Abstract This paper covers my work during my assignment as participant of CERN Summer Students 2015 programme. The project was aimed at refactoring and publication of the Web Application Detection tool, which was developed at CERN and priorly used internally by the Computer Security team. The range of tasks performed include initial refactoring of code, which was developed like a script rather than a Python package, through extracting components that were not specific to CERN usage, the subsequent final release of the source code on GitHub and the integration with third-party software i.e. the w3af tool. Ultimately, Web Application Detection software received positive responses, being downloaded ca. 1500 times at the time of writing this report. 3 2. Project specification 2.1. What is Web Application Detection (WAD)? Web Application Detection is a website fingerprinting tool developed by the Computer Security team at CERN, that allows to scan websites and web servers in search for used technologies and software. The tool is based on an open-source browser extension called Wappalyzer, originally developed by Elbert Alias1. It was used internally for years until the decision to make some of its parts public has been made. The tool is parsing HTTP responses received from a scanned target, in search for traces, that indicate usage of certain software. Detection results may contain details about the website, including, but not limited to operating system, web server, databases, content delivery networks, programming language, content management systems, frameworks, analytic tools and JavaScript libraries. Over 700 different technologies can be recognized and this number will only grow, as WAD uses the Wappalyzer’s database, which is constantly extended. CERN employees have contributed a lot into creation of that database during the period of internal use. 2.2. Original project goals Evaluation of WAD’s usage results led to the conclusion, that the tool is powerful and useful enough to be shared with the world-wide community. This was meant to let other people use it and at the same time contribute to it. This project was focused on making necessary modifications in order to make parts of WAD publicly available as open-source software and render it easily integrable with third party tools, such as OpenVAS and w3af vulnerability scanners. 2.3. Additional achievements During my work on the project, multiple goals that were not part of original scope were accomplished. I have successfully managed to refactor most of the codebase, making it more clean and extensible. I have also worked on automating tasks related to maintaining the project, i.e. updating the detection database and checking code correctness. Tests coverage has been improved and a continuous integration system has been set up. The documentation of the project was revamped and enhanced. Numerous features have been added, improving the overall usability of the tool. The public part of WAD now can be run using Python 2.6, 2.7, 3.2, 3.3 and 3.4. 1 https://github.com/AliasIO/Wappalyzer 4 3. Initial code assessment and refactoring 3.1. Determining project usability for public audience The first task was focused on evaluating, whether this kind of project is desired to be publically available. The WAD’s predecessor – Wappalyzer – had a Python wrapper, but it was only a simple script running the original JavaScript implementation over a retrieved website. There was no plain Python counterpart and it seemed to prevent Wappalyzer’s usage, not as browser extension, but as a standalone tool or plug-in2. This leads to another problem, which was the lack of actively developed free open- source website scanners. The only project that has been found was WhatWeb, whose development almost stopped in 20123, while Wappalyzer keeps on receiving continuous support from the original author and many other developers. The reality of web technologies requires stable support, since those technologies evolve very quickly and software is becoming promptly outdated. 3.2. Creating environment for code development This project was moved from the AFS-contained Git repository to the online GitLab repository. Later on, a public respository was created on GitHub4, which is the most popular storage for open-source projects nowadays. The continuous integration for WAD was set up using Jenkins CI5. It was configured to automatically build and test the project’s code on every commit push to the GitLab repository. Additionally, code quality checks using the pep8 and pylint tools are run and the results of those checks are available for the user to review. Finally, Jenkins builds generate JUnit test results, which allows the user to easily track and analyse failing tests. 3.3. Code refactoring WAD seemed to support Python versions as old as 2.4. I have decided to abandon support of Python older than 2.6, because less than 2% of Python users claim to regularly use those versions6 and it would be virtually impossible to make the code compatible with Python 3 without doing this. This also allowed to reduce the set of package’s dependencies, since some previously external libraries were included later into Python’s standard library. 2 https://github.com/andresriancho/w3af/issues/1081 3 https://github.com/urbanadventurer/WhatWeb/graphs/contributors 4 https://github.com/CERN-CERT/WAD 5 https://jenkins.cern.ch/WAD 6 http://www.randalolson.com/wp-content/uploads/python-survey-2014-versions-regularly-use.png 5 Due to former WAD’s script-like nature, the code wasn’t ready to be reused and released as a Python package. It contained global variables for having data shared between files, executable code was contained in one file with tests and callable methods and it didn’t follow any popular Python code convention. Those problems were resolved by: Moving all tests to a separate directory, with unique files for each tested Python source file; Using the singleton class for containing shared data in lieu of global variables; Applying pep8 rules to the project’s source. Additionally, code that wasn’t directly related to the scripts’ execution was extracted