“An Orchestration Approach for Unwanted Internet Traffic Identification”
Total Page:16
File Type:pdf, Size:1020Kb
Federal University of Pernambuco Informatics Center Doctorate in Computer Science “An Orchestration Approach for Unwanted Internet Traffic Identification” Eduardo Luzeiro Feitosa Recife, August 2010. Federal University of Pernambuco Informatics Center Doctorate in Computer Science “An Orchestration Approach for Unwanted Internet Traffic Identification” Eduardo Luzeiro Feitosa This thesis has been submitted to the Informatics Center of the Federal University of Pernambuco as a partial requirement to obtain the degree of Doctor in Computer Science. Supervisor: Prof. Dr. Djamel Fawzi Hadj Sadok Co-Supervisor: Prof. Dr. Eduardo James Pereira Souto Recife, August 2010. Feitosa, Eduardo Luzeiro An orchestration approach for unwanted internet traffic identification / Eduardo Luzeiro Feitosa. - Recife: O Autor, 2010. xiii, 172 folhas : il., fig., tab. Tese (doutorado) Universidade Federal de Pernambuco. CIn. Ciência da Computação, 2010. Inclui bibliografia. 1. Redes de computadores. 2. Segurança da informação. 3. Medições de tráfego. I. Título. 004.6 CDD (22. ed.) MEI2010 – 0146 To my parents, wife and sons i Acknowledgments The fulfillment of this Thesis would not have been possible without the contribution of a large number of persons. The first persons I am deeply indebted to are my wife Livia Soraya and my children‟s Gabriel, Bruna and Luísa. While they did not contribute to this Thesis directly, but I would like to thank them for their support and love. I would also like to thank my parents, Clarice and José Ribamar, for their important support. I owe a big thank-you to my Thesis advisor, Djamel Sadok, who offered the opportunity to join his research team a few years ago. I must say that at that time, I was not really imagining the kind of experience I was about to embark on. I appreciated very much Djamel‟s very pragmatic approach to networking. Djamel is also an inexhaustible source of networking references. Through his enthusiasm and unlimited support (time, ideas and experience), he helped me to complete this Thesis. In addition to that, this work contains the fruits of many and lengthy discussions regarding the present contributions with Djamel. My sincere thanks also go to Professor Judith Kelner for giving me as well as to my family a great deal of support when we moved to Recife. She was instrumental in my participation in the GPRT group. I learned a lot from her valuable experience. I am also thankful for the excellent example she has provided as a successful researcher and Professor. The third person I would like to thank is my Thesis co-advisor Eduardo Souto. Souto has been a great friend from Manaus, where we worked together in the University‟s Data Processing Center (CPD). He was key to my decision for choosing Recife as the place to do my Doctorate studies. I learned a lot from the references he pointed me to and from the many discussions we had. We collaborated on a lot of problems, especially about the design of the OADS Miner. I am grateful to the external members of my Thesis examination committee, namely professors, Artur Ziviani, Carlos Alberto Kamienski, Nelson Rosa and Paulo Maciel, and must thank them warmly for accepting to be part of this work. I would especially like to thank Carlos Alberto Kamienski and Nelson Rosa for their detailed comments and recommendations when defending the proposal for this Thesis, more than a year ago. Let me also thank the members of GPRT for their direct collaboration and participation in this Thesis: Bruno Lins, Rodrigo Melo, Leo Vilaça, Thiago Rodrigues, and Fernando Rodrigues. Bruno Lins helped me to develop the Alert Pre-Processor tool. In addition to this collaboration, I am also obliged to thank him for using his tool for Dempster-Shafer analysis from C++ to Java in this Thesis. Rodrigo Melo helped implementing almost all the experiments presented in this Thesis. Leo Vilaça and I collaborated on the study of the frequent episode analysis and the joint development of the Alert Analyzer tool. Thiago Rodrigues and I collaborated on the characterization of the OASD Miner (ARAPONGA) tool and we had several discussions on how to turn this more efficient with the extraction of security information from the Web. Fernando Rodrigues helped me a lot improve the writing of many parts of this Thesis. ii Other members of GPRT that also deserve my thanks due to their contributions with different points of the study are: Rafael Aschoff, Guthemberg Silvestre, Patricia Endo, Luis Eduardo Oliveira, and Josias Junior. I would also acknowledge the work done by Manuela Melo e Nadia Silva and their contributions to the pleasant ambiance that reigns in GPRT. Lastly, I offer my regards and blessings to all of those who supported me in any respect during the completion of this thesis. Eduardo Luzeiro Feitosa August 30, 2010 iii Resumo Um breve exame do atual tráfego Internet mostra uma mistura de serviços conhecidos e desconhecidos, novas e antigas aplicações, tráfego legítimo e ilegítimo, dados solicitados e não solicitados, tráfego altamente relevante ou simplesmente indesejado. Entre esses, o tráfego Internet não desejado tem se tornado cada vez mais prejudicial para o desempenho e a disponibilidade de serviços, tornando escasso os recursos das redes. Tipicamente, este tipo de tráfego é representado por spam, phishing, ataques de negação de serviço (DoS e DDoS), vírus e worms, má configuração de recursos e serviços, entre outras fontes. Apesar dos diferentes esforços, isolados e/ou coordenados, o tráfego Internet não desejado continua a crescer. Primeiramente, porque representa uma vasta gama de aplicações de usuários, dados e informações com diferentes objetivos. Segundo, devido a ineficácia das atuais soluções em identificar e reduzir este tipo de tráfego. Por último, uma definição clara do que é não desejado tráfego precisa ser feita. A fim de solucionar estes problemas e motivado pelo nível atingido pelo tráfego não desejado, esta tese apresenta: 1. Um estudo sobre o universo do tráfego Internet não desejado, apresentado definições, discussões sobre contexto e classificação e uma série de existentes e potencias soluções. 2. Uma metodologia para identificar tráfego não desejado baseada em orquestração. OADS (Orchestration Anomaly Detection System) é uma plataforma única para a identificação de tráfego não desejado que permite um gerenciamento cooperativa e integrado de métodos, ferramentas e soluções voltadas a identificação de tráfego não desejado. 3. O projeto e implementação de soluções modulares integráveis a metodologia proposta. A primeira delas é um sistema de suporte a recuperação de informações na Web (WIRSS), chamado OADS Miner ou simplesmente ARAPONGA, cuja função é reunir informações de segurança sobre vulnerabilidades, ataques, intrusões e anomalias de tráfego disponíveis na Web, indexá-las eficientemente e fornecer uma máquina de busca focada neste tipo de informação. A segunda, chamada Alert Pre- Processor, é um esquema que utilize uma técnica de cluster para receber múltiplas fontes de alertas, agregá-los e extrair aqueles mais relevantes, permitindo correlações e possivelmente a percepção das estratégias usadas em ataques. A terceira e última é um mecanismo de correlação e fusão de alertas, FER Analyzer, que utilize a técnica de descoberta de episódios frequentes (FED) para encontrar sequências de alertas usadas para confirmar ataques e possivelmente predizer futuros eventos. De modo a avaliar a proposta e suas implementações, uma série de experimentos foram conduzidos com o objetivo de comprovar a eficácia e precisão das soluções. Palavras-Chave: Tráfego Internet não Desejado, Orquestração, Correlação de Alertas, Descoberta de Episódios Frequentes, WIRSS. iv Abstract A brief examination of the current Internet traffic shows a varying mix of known and unknown services, legacy and new applications, legitimate and illegitimate traffic, solicited and unsolicited data, highly relevant and unwanted traffic. Among these, unwanted Internet traffic is increasingly becoming harmful to network performance and service availability, often taking up processing and scarce network resources. Typically, unwanted traffic is represented, in general, by spoofing activities, spam, phishing, DoS and DDoS, virus and worms, misconfiguration, or among other sources. Nonetheless, there are many isolated and coordinated efforts to deal with this issue, unwanted Internet traffic continues to grow. First, because basically unwanted traffic represents a wide range of user applications, network data and harmful information with different objectives for its existence. Secondly, the inefficiency of the current solutions to identify, reduce, and stop unwanted traffic is notorious. The increase in Internet link bandwidth and service mix makes the timely detection of unwanted traffic an interminable task that does not scale easily as such links increase in capacity. Lastly, a clear definition of what is unwanted traffic remains to be elaborated. In order to address these problems and motivated by the current alarming situation that unwanted traffic has reached, this thesis presents: 4. A study of unwanted Internet traffic universe, presenting definitions, discussing about context and classifications, and a series of existing and potential solutions. 5. An approach to identify unwanted traffic based on orchestration defined as OADS. OADS (Orchestration Anomaly Detection System) is a single-platform for unwanted traffic identification management to allow an integrated management of all cooperative methods,