Protecting Users Against Massive Attacks in Content Distribution Systems
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL INSTITUTO DE INFORMÁTICA PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO FLÁVIO ROBERTO SANTOS Slowing Down to Speed Up: Protecting Users Against Massive Attacks in Content Distribution Systems Thesis presented in partial fulfillment of the requirements for the degree of Doctor of Computer Science Prof. Ph.D. Luciano Paschoal Gaspary Advisor Prof. Ph.D. Marinho Pilla Barcellos Coadvisor Porto Alegre, July 2013 CIP – CATALOGING-IN-PUBLICATION Santos, Flávio Roberto Slowing Down to Speed Up: Protecting Users Against Massive Attacks in Content Distribu- tion Systems / Flávio Roberto Santos. – Porto Alegre: PPGC da UFRGS, 2013. 103 f.: il. Thesis (Ph.D.) – Universidade Federal do Rio Grande do Sul. Programa de Pós-Graduação em Computação, Porto Alegre, BR– RS, 2013. Advisor: Luciano Paschoal Gaspary; Coadvisor: Mar- inho Pilla Barcellos. 1. Content distribution systems. 2. Peer-to-peer systems. 3. File sharing. 4. Streaming systems. 5. Tagging systems. 6. Conservative strategies. 7. Delaying mechanisms. 8. Con- tent pollution. 9. Massive attacks. I. Gaspary, Luciano Paschoal. II. Barcellos, Marinho Pilla. III. Título. UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL Reitor: Prof. Carlos Alexandre Netto Pró-Reitor de Coordenação Acadêmica: Prof. Rui Vicente Oppermann Pró-Reitora de Pós-Graduação: Prof. Vladimir Pinheiro do Nascimento Diretor do Instituto de Informática: Prof. Luis da Cunha Lamb Coordenador do PPGC: Prof. Luigi Carro Bibliotecário-chefe do Instituto de Informática: Alexsander Borges Ribeiro “The difference between a successful person and others is not a lack of strength, not a lack of knowledge, but a lack of will.” —VINCENT THOMAS LOMBARDI AGRADECIMENTOS / ACKNOWLEDGMENTS Primeiramente gostaria de agradecer aos meus pais por me permitirem chegar onde cheguei. Em especial agradeço à minha mãe, que é a principal responsável por todas as minhas conquistas. Obrigado aos meus irmãos, Fábio e Felipe, e a todos os familiares que de alguma forma me deram um empurrãozinho. Muito obrigado também aos integrantes do Grupo de Redes da Universidade Federal do Rio Grande do Sul, com destaque ao meu irmão/parceiro Weverton (Tchê Pará), pelas incontáveis discussões, bebedeiras e viagens. A caminhada teria sido muito mais difícil sem todo o companheirismo e cumplicidade de todos. Obrigado aos meus orientadores, Luciano Gaspary e Marinho Barcellos, por terem aceitado o desafio de orientar uma pessoa tão cabeça dura como eu. Obrigado pela excelência e dedicação sempre presentes na condução dos trabalhos. Registro também minha gratidão aos professores e ao corpo técnico-administrativo do Instituto de Infor- mática pelo pronto-atendimento e a seriedade com a qual conduzem seus trabalhos. Obrigado à galera da pensão do Eraldo e da Housing 158/164, repúblicas onde morei ao longo dos quase quatro anos que passei em Porto Alegre. Aprendi bastante com tanta gente tão diferente juntas. Diversas conversas agradáveis e momentos de companheirismo tornaram essa estadia inesquecível. Muito obrigado a todos os amigos cultivados em terras gaúchas. Agradeço também a compreensão e paciência da minha namorada, Camila, que suportou a distância durante grande parte do meu doutorado. Nesse contexto, o plano Infinity da TIM, a falecida Webjet e o Skype tiveram sua importante contribuição (risos). There are also some people abroad that I would like to mention. Special thanks to my advisor Burkhard Stiller, who gave me the opportunity to spend one year as a guest researcher in the Communication Systems Group at the University of Zurich, Switzerland. I would like also to thank all my friends and colleagues in Switzerland for their valuable discussions and great time there. Vielen Dank! Muito obrigado a todos aqueles que contribuiram para minha formação de pesquisador, o que inclui também os amigos do Laboratório de Sistemas Distribuídos da Universidade Federal de Campina Grande, onde fui graduado. Por fim, espero um dia poder retribuir de alguma forma a todo o povo brasileiro, que mesmo sem saber, financiou grande parte dos meus estudos. CONTENTS LISTOFABBREVIATIONSANDACRONYMS . 9 LISTOFSYMBOLS............................... 11 LISTOFFIGURES................................ 13 LISTOFTABLES ................................ 15 ABSTRACT ................................... 17 RESUMO..................................... 19 1 INTRODUCTION .............................. 21 1.1 Contributions ................................. 23 1.2 Organization ................................. 24 2 BACKGROUNDANDSTATEOFTHEART . 25 2.1 Content distribution systems ........................ 26 2.1.1 Classificationanddimensions. .... 26 2.1.2 Filesharing ................................. 27 2.1.3 Streamingsystems.............................. 32 2.1.4 Discussions ................................. 35 2.2 State of the art ................................ 35 2.2.1 Thecontentpollutionandmassiveattacks . ....... 35 2.2.2 Relatedwork ................................ 37 2.3 Summary ................................... 41 3 CONSERVATIVE APPROACH BASED ON BINARY VOTES . 43 3.1 FUNNEL model ................................ 43 3.1.1 Overallstrategy ............................... 43 3.1.2 Estimatingthenumberofconcurrentdownloads . ........ 45 3.1.3 Adjustingthenumberofconcurrentdownloads . ....... 46 3.1.4 Ensuringuniquevotesperuser . ... 47 3.1.5 Incentivesforuserstovote . ... 47 3.2 Evaluation ................................... 48 3.2.1 Experimentdetails............................. 49 3.2.2 Effectivenessofthemechanism. .... 50 3.2.3 Settingthenumberofconcurrentdownloads . ....... 51 3.2.4 Impactofthemechanismonpeerjoins . .... 52 3.3 Considerations on the proposed solution .................. 54 3.4 Summary ................................... 55 4 EXTENDING THE MODEL TO DEAL WITH SUBJECTIVENESS . 57 4.1 DÉGRADÉ model ............................... 57 4.1.1 Stablepatternsintagproportions . ...... 58 4.1.2 Variationmetric ............................... 59 4.1.3 Adjusting the number of allowable concurrent downloads......... 60 4.2 Evaluation ................................... 61 4.2.1 Datasetdetails................................ 62 4.2.2 Evaluationscenarios. .. 62 4.2.3 Swarm-based content distribution fluid-based model . ........... 63 4.2.4 Sensitivityanalysis . .. 64 4.2.5 Results.................................... 66 4.3 Considerations on the proposed solution .................. 72 4.4 Summary ................................... 73 5 GENERALIZATIONOFTHEMODEL . 75 5.1 Problem formalization ............................ 76 5.2 Conservative strategy ............................ 77 5.3 Evaluation ................................... 79 5.3.1 Evaluationscenarios. .. 79 5.3.2 Baseline ................................... 79 5.3.3 Results.................................... 80 5.4 Considerations on the proposed solution .................. 86 5.5 Summary ................................... 87 6 SUMMARY,CONCLUSIONS,ANDFUTUREWORK . 89 6.1 Summary of contributions .......................... 89 6.2 Final remarks ................................. 90 APPENDIXA CAPÍTULOEMPORTUGUÊS . 93 REFERENCES.................................. 95 LIST OF ABBREVIATIONS AND ACRONYMS CCDF Complementary Cumulative Distribution Function CDF Cumulative Distribution Function CDS Content Distribution Systems DES Discrete-Event Simulation DHT DistributedHash Table DoS Denial-of-Service HTTP Hypertext Transfer Protocol ISP Internet Service Provider LRF Local Rarest First MP3 MPEG-1 or MPEG-2 Audio Layer III NAT Network Address Translation P2P Peer-to-Peer PEX Peer Exchange QoE QualityofExperience TFT Tit-For-Tat LIST OF SYMBOLS R Binary reputation calculated using positive and negative votes. p Number of positive votes issued by users. n Number of negative votes issued by users. r Threshold for R to stop controlling concurrent downloads. Amin Number of allowed concurrent downloads when R =0. Amax Number of allowed concurrent downloads when R 1. → A Number of allowed concurrent downloads calculated in terms of R. D Number of concurrent downloads taking place in the system. I Initial number of seeders uploading a content when it is published. C Number of honest users who join the system to download a content. M Proportion of malicious users who arrive to attack the system. ∆ Vocabulary variation calculated using the relative frequencies of tags. Φt Relative frequencies of a tag t. σt Standard deviation for the relative frequencies of a tag t. δ Threshold for ∆ to stop controlling concurrent downloads. λ Arrival rate function which determines users’ behaviors. λ Rate at which users are allowed to join the system. Quality of experience metric used to evaluate the delaying strategy. Q Average waiting time used to measure the overhead on users. W LIST OF FIGURES Figure 2.1: Histogram of torrent status per groups of 20 movie titles, sorted by theproportionofcopiesmarkedasFAKE.. 37 Figure 2.2: CCDF of the number of completed FAKE copy downloads...... 38 Figure 3.1: When content reputation R exceeds r, it is deemed non-polluted (a), whichoccursforaspecificvoterange(b) . 45 Figure 3.2: Measurements on DHT support across BitTorrent communities. 47 Figure 3.3: CDF of swarm sizes and distribution of users amongtheseswarms . 49 Figure 3.4: Effectiveness against pollution attacks with colludingpeers. 51 Figure 3.5: Number of downloads during the dissemination of acontent . 52 Figure 3.6: Proportion of peers (honest and malicious) joining the system during the dissemination of a content when FUNNEL ispresent . 53 Figure