Google Dorks: Analysis, Creation, and new Defenses Flavio Toffalini1, Maurizio Abb`a2, Damiano Carra1, and Davide Balzarotti3 1 University of Verona, Italy
[email protected], flavio.toff
[email protected] 2 LastLine, UK
[email protected] 3 Eurecom, France
[email protected] Abstract. With the advent of Web 2.0, many users started to maintain personal web pages to show information about themselves, their busi- nesses, or to run simple e-commerce applications. This transition has been facilitated by a large number of frameworks and applications that can be easily installed and customized. Unfortunately, attackers have taken advantage of the widespread use of these technologies { for exam- ple by crafting special search engines queries to fingerprint an application framework and automatically locate possible targets. This approach, usu- ally called Google Dorking, is at the core of many automated exploitation bots. In this paper we tackle this problem in three steps. We first perform a large-scale study of existing dorks, to understand their typology and the information attackers use to identify their target applications. We then propose a defense technique to render URL-based dorks ineffective. Finally we study the effectiveness of building dorks by using only com- binations of generic words, and we propose a simple but effective way to protect web applications against this type of fingerprinting. 1 Introduction In just few years from its first introduction, the Web rapidly evolved from a client-server system to deliver hypertext documents into a complex platform to run stateful, asynchronous, distributed applications. One of the main character- istics that contributed to the success of the Web is the fact that it was designed to help users to create their own content and maintain their own web pages.