Unconstrained Endpoint Profiling (Googling the Internet) Ionut Trestian Supranamaya Ranjan Northwestern University Narus Inc. Evanston, IL, USA Mountain View, CA, USA
[email protected] [email protected] Aleksandar Kuzmanovic Antonio Nucci Northwestern University Narus Inc. Evanston, IL, USA Mountain View, CA, USA
[email protected] [email protected] ABSTRACT Keywords Understanding Internet access trends at a global scale, i.e. , what do Google, endpoint profiling, traffic classification, clustering, traffic people do on the Internet, is a challenging problem that is typically locality addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the 1. INTRODUCTION Internet endpoints is already available around us — on the web. Understanding what people are doing on the Internet at a global In this paper, we introduce a novel approach for profiling and scale, e.g. , which applications and protocols they use, which sites classifying endpoints. We implement and deploy a Google -based they access, and who they try to talk to, is an intriguing and im- profiling tool, which accurately characterizes endpoint behavior portant question for a number of reasons. Answering this question by collecting and strategically combining information freely avail- can help reveal fascinating cultural differences among nations and able on the web. Our ‘unconstrained endpoint profiling’ approach world regions. It can shed more light on important social tenden- shows remarkable advances in the following scenarios: ( i) Even cies ( e.g. , [36]) and help address imminent security vulnerabilities when no packet traces are available, it can accurately predict appli- (e.g.