Untangling the Web Finding Your Forgotten Assets

Bachelor of Science in Computer Science June 2018 Untangling the Web Finding Your Forgotten Assets Victoria Sigurdsson Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fullment of the requirements for the degree of Bachelor of Science in Computer Science. The thesis is equivalent to 10weeks of full time studies. The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identied as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree. Contact Information: Author(s): Victoria Sigurdsson E-mail: [email protected] University advisor: Associate Professor Emiliano Casalicchio Department ofComputer Science and Engineering Faculty ofComputing Internet : www.bth.se Blekinge Institute ofTechnology Phone : +46 455 38 50 00 SE371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract Background. In the years between 2016 and 2017, the number of attacks against web application increased by approximately 21.89percent. The total recorded amount of incidents during 2017 was 6,502 [9, 10]. To assure security, patching and scanning are required. This assumes that the company is aware of all their external facing web applications. The company Outpost24 is observing an increased request for a solution capable of nding all external web application owned by one company. Objectives. This thesis study six methods to identify assets owned by one company. The methods are classied into weak and strong indicators. Based on the classications, two algorithms are developed. The algorithms are executed against two companies, Outpost24 and Company A. The objective is to evaluate the six methods and decide if the methods are suitable for retrieving assets owned by one company. Methods. This study includes two experiments testing the two algorithms on two dierent companies. The experiments focus on to retrieve assets and data to make a decision upon the ownership of the assets. The observed data from the experiments are compared against data known by the two companies to verify if any data is unknown to the company prior to the experiment. Results. The results show that the identied methods are suitable for both identify assets and to decide upon ownership. Furthermore, assets not previously known was possible to identify. The results from the two methods are visualized as two node maps, providing an overview of identied assets. Conclusions. It was concluded that there are methods that are useful when extracting assets from one given assets, and there are methods useful for extracting data used when deciding upon the owner. The methods will assist companies in raising their own awareness of their external facing assets, and in some cases identify assets which were previously unknown to them. Keywords: Forgotten assets, web application, asset retrieval, security Contents Abstract i 1 Introduction 1 1.1 Research Focus .............................. 2 1.2 Research Process ............................. 2 1.3 Methodology ............................... 2 2 Background 4 2.1 Web crawling ............................... 4 2.2 Web Application Vulnerability Scanning ................ 4 2.3 Domain Name System .......................... 5 3 Related Work 7 3.1 Relationship between websites ...................... 7 3.2 Hyperlinks ................................. 7 3.3 DNS lookup ................................ 8 3.4 Subdomain enumeration ......................... 8 3.5 Google Hacking .............................. 8 3.6 Shared indicators ............................. 9 3.6.1 Web Analytics Tools ....................... 9 3.6.2 Multi-Domain SSL Certicates ................. 9 3.6.3 Authorship identication ..................... 10 4 Classicaton 12 4.1 Indicators ................................. 12 4.1.1 Strong indicator .......................... 12 4.1.2 Weak indicators . ......................... 13 5 Implementation 17 5.1 Strong and weak indicators ........................ 17 5.1.1 Study: Strong Indicators ..................... 17 5.2 Logic .................................... 18 5.3 Decision .................................. 20 6 Results 23 6.1 Outpost24.com .............................. 23 6.2 Company A ................................ 26 iii 7 Analysis and Discussion 30 7.1 Experiment 1 - Outpost24 ........................ 30 7.2 Experiment 2 - Company A ....................... 30 7.3 Discussion ................................. 31 8 Conclusions and Future Work 33 References 36 Appendix A Strong Indicators 38 Appendix BWeak Indicators 48 Appendix C OUTPOST24 web portfolio 50 Appendix D Asset map 52 D.1 Outpost24 ................................. 53 D.2 Company A ................................ 54 iv List of Tables 1 Dierence between certicate types ................... 15 2 Study of connections between domains ................. 19 3 Unique assets owned by Outpost24 ................... 25 4 The frequency of methods used for asset retrieval ........... 25 5 Outpost24 - The frequency of methods used for deciding owner .... 25 v Glossary asset Umbrella term for a website, IP address or domain name indicator Information extracted from an asset that is used when deciding upon ownership between assets owner Person or company owning an asset strong indicator An indicator that gives an almost 100 percent guarantee that the assets are owned by the same owner weak indicator An indicator that, when on its own, does not provide evidence enough on the ownership between two assets.However, considered to- gether with other weak indicators increases the chance for identical ownership vii Chapter 1 Introduction Almost every company today needs an online presence. One example is for commer- cial stores where an increased shift to online shopping has been observed between the years 2014 and 2017 [8]. The Internet allows companies to reach a broader market, reducing geographical limitations and boosting growth. Furthermore, companies web applications are accessible to internet users through search engines. An online presence introduces the potential risk of an event of a breach. The amount of incidents in which a web application is the vector of attack has increased by approximately 21.89 percentage between 2016 and 2017, with a total recorded amount of 6,502 incidents during 2017 [9, 10]. One of the greater breaches of 2017 disclosed personal data of 145 million people. A breach at this altitude could potentially damage the entire brand of a company and decrease their reliability. Having an online presence and avoiding a potential breach is achieved through monitoring and securing the web applications. Furthermore, apply security patches and perform regular scans of the web applications increases the security [11, 6]. How- ever, this assumes that the company is aware of all external facing web applications owned by them. Outpost24, a company providing vulnerability management solutions, is observing an increased need for a solution capable of enumerating all external facing web applications owned by one company. It is of great importance for the companies to have knowledge of their external facing web applications to secure their brand and condential information. Every web application not monitored poses a potential risk of a breach. The need originates from companies with poor or zero knowledge of their external facing assets. Furthermore, the companies experience a great uncertainty of their digital ownership. The main focus of this research is on identifying methods to extract unknown assets from known assets. The ability to determine ownership through the methods is crucial for the goal to identify external facing assets owned by one company. Ownership in this context, and throughout the research, is dened as the company in charge of the asset, or a person employed by the company. This information will assist Outpost24 in providing a solution that meets the increased need seen among customers and prospects. 1 2 Chapter 1. Introduction 1.1 Research Focus This study is driven by two research questions that will be examined and answered in this thesis work: RQ1. What methods can be used to identify other assets from one known domain name? RQ2. What is the accuracy in assuming the ownership of an asset, and what cases are user guidance required? 1.2 Research Process To be able to answer the research questions, the research is divided into two phases. Each phase provides an answer to one research question. The rst phase is devoted to identify dierent methods that can be used to extract unknown domains from given domains and provide an answer to the rst research question(RQ1). The second and last phase concerns categorization and evaluation of the methods to provide an answer to the second research question(RQ2). 1.3 Methodology The methodology is based on the work written by C.R. Kothari (2004) where four research approaches are discussed [4]. Each approach contains two counterparts. Descriptive vs. Analytical This research is analytical as data from companies is used for evaluation of the theories. Applied vs. Fundamental A solution needs to be found for a practical problem which makes the research applied. Quantitative vs. Qualitative This research is qualitative as theories need to be dened for deciding upon ownership for assets. Conceptual vs. Empirical This research tests hypothesis

Untangling the Web Finding Your Forgotten Assets

Large-Scale, Automatic XSS Detection Using Google Dorks

Recent Developments in Cybersecurity Melanie J

Hacking the Master Switch? the Role of Infrastructure in Google's

Google Dorks: Use Cases and Adaption Study

Google Hacking 101

EVILSEED: a Guided Approach to Finding Malicious Web Pages

Pulp Google Hacking:The Next Generation Search Engine Hacking

0321518667 Sample.Pdf

Google Dorks: Analysis, Creation, and New Defenses

Adversarial Information Retrieval on the Web

Internet of Torment: the Governance of Smart Home Technologies Against Technology-Facilitated Violence

Google Chrome