Phd Thesis.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TECNICO´ Unobservable Multimedia-based Covert Channels for Internet Censorship Circumvention Diogo Miguel Barrinha Barradas Supervisor: Doctor Lu´ısEduardo Teixeira Rodrigues Co-Supervisor: Doctor Nuno Miguel Carvalho dos Santos Thesis approved in public session to obtain the PhD Degree in Computer Science and Engineering Jury Final Classification: Pass with Distinction and Honour 2021 UNIVERSIDADE DE LISBOA INSTITUTO SUPERIOR TECNICO´ Unobservable Multimedia-based Covert Channels for Internet Censorship Circumvention Diogo Miguel Barrinha Barradas Supervisor: Doctor Lu´ısEduardo Teixeira Rodrigues Co-Supervisor: Doctor Nuno Miguel Carvalho dos Santos Thesis approved in public session to obtain the PhD Degree in Computer Science and Engineering Jury Final Classification: Pass with Distinction and Honour Jury Chairperson: Doctor Jos´eManuel da Costa Alves Marques, Instituto Superior T´ecnico, Universidade de Lisboa Members of the Committee: Doctor Lu´ıs Eduardo Teixeira Rodrigues, Instituto Superior T´ecnico, Universidade de Lisboa Doctor Henrique Jo~ao Lopes Domingos, Faculdade de Ci^encias e Tecnologia, Universidade Nova de Lisboa Doctor Mohammad Tariq Ehsan Elahi, School of Informatics, University of Edinburgh, UK Doctor Bruno Emanuel da Gra¸ca Martins, Instituto Superior T´ecnico, Universidade de Lisboa Funding Institutions Universidade de Lisboa Funda¸c~aopara a Ci^enciae Tecnologia 2021 Acknowledgements The work presented in this thesis was only made possible with the help of many people who crossed my path during the last few years. I am forever grateful to them. I start by expressing my kind thanks to my supervisor, Prof. Lu´ısRodrigues, who first incentivized me to pursue a Ph.D.. Apart from instilling the value of hard work, from all the helpful discussions, and from the countless insights he shared with me regarding numerous academic affairs, I mainly thank him for continuously challenging me to deliver high-quality work and for pushing me beyond my limits. In like manner, I warmly thank my co-supervisor, Prof. Nuno Santos, for his constant and energetic encouragement, absolutely critical brainstorming sessions, and for the many hours invested in methodologically training me to conduct research, to write clearly, and to present ideas neatly. I am deeply honored for the opportunity to work, study, and learn under their guidance. Their fierce dedication has inspired me to pursue an academic career and I hope our paths may cross again in the future. During my Ph.D., I had the opportunity to collaborate and learn from other talented researchers and faculty. I thank my co-authors Salvatore Signorello, Fernando Ramos, Zachary Weinberg, and Nicolas Christin. Working with them gave me the opportunity to broaden the spectrum of my research and helped me grow as a scientist. I would also like to thank the faculty of the Distributed Systems Group (GSD) at INESC-ID for their willingness to exchange ideas and for the many times their valuable advice helped me shape and improve my research. I would like to thank all my friends and colleagues at GSD for their companionship and support. I am grateful for all the great times we spent together, including the many meals, drinks, and coffee breaks shared over the last few years. In no particular order, I thank: Jo~aoLoff, Tiago Brito, Miguel Coimbra, Jo~aoSantos, Igor Zavalyshyn, Daniel Castro, Daniel Andrade, Paulo Silva, Pedro Joaquim, Manuel Bravo, Subhajit Sidhanta, Daharewa Gureya, Rodrigo Bruno, Richard Martinez, Paolo Laffranchini, Shady Issa, Cl´audioCorreia, Taras Lykhenko, Jo~aoNeto, Mennan Selimi, and Pedro Raminhas. A special thanks goes to Daniel Porto, the networking wizard who always found the time for discussing the details of my experimental setups with me. Most importantly, I would like to thank the unconditional support of all my long-time friends and family who were always there for me throughout this journey. I wholeheartedly thank Maria for her never-ending encouragement, for her patience to stay by my side despite the constant deadlines, and for shifting the weights on my work-life balance scale. Our evening strolls kept me sane during the long weeks where all experiments seemed to go nowhere. This work has been partially funded by the Universidade de Lisboa (via the BD2018 doctoral scholarship) and by Funda¸c~aopara a Ci^enciae Tecnologia (FCT) (via the SFRH/BD/136967/2018 grant, the project COSMOS with reference PTDC/EEI-COM/29271/2017, and the INESC-ID funding with reference UIDB/50021/2020). To a world where people can speak up their minds without fear, Abstract Totalitarian states are known to deploy large-scale surveillance and censorship mechanisms in order to deter citizens from accessing or publishing information on the Internet. Still, even the most oppressive regimes cannot afford to always block all channels with the outside world, and usually allow the operation of widely used services such as video-conferencing applications. This has given rise to the development of censorship-resistant communication tools that rely on the establishment of covert channels in the Internet by encoding covert data within popular multimedia protocols that use encrypted communication, e.g., Skype. A recent approach for the design of such tools, named multimedia protocol tunneling, modulates covert data into the audio and/or video feeds provided to multimedia applications. However, depending on the techniques used to embed covert data, and on the amount of information to embed, multimedia protocol tunneling tools may generate network flows that differ subtly from legitimate flows that do not carry covert channels. Notably, such differences can be uncovered using strictly passive methods (e.g., by observing the length or inter-arrival time of network packets). Incidentally, one of the major challenges faced by the above tools is that of achieving a proper balance between traffic analysis resistance and performance (e.g., achieve sufficient throughput for enabling web browsing activities). This thesis focuses on the study of the efficacy of multimedia protocol tunneling tools to evade the censorship apparatus deployed by network adversaries, while providing sufficient performance for enabling common Internet activities (e.g., web browsing). First, we show that the covert channels generated by existing tools are prone to detection. Specifically, we developed a new machine learning (ML)-based traffic analysis framework which has broken the security assumptions of recent multimedia protocol tunneling tools. Second, we show that network adversaries currently possess the means to perform sophisticated ML-based network flow classification tasks at line-speed. To this end, we worked towards the efficient deployment of multiple ML-based traffic analysis frameworks (including our own) in programmable switches. Third, we devised a new technique for creating traffic analysis resistant covert channels over multimedia streams. Our approach, based on the careful modification of the video encoding pipeline of the WebRTC framework, allows for the creation of high-speed covert channels over multimedia flows whose traffic patterns closely resemble those of legitimate flows. Organization of the document: The dissertation is divided in two parts. The first part provides the motivation and background for my work, identifies the main contributions of the thesis, offers an overview of the results achieved, and proposes directions for future work. The second part consists of a collection of the main papers that resulted from my work. This part reflects the content of the published papers, with minor formatting adjustments to fit the layout of the dissertation. Resumo Nos dias de hoje, v´ariosregimes repressivos empregam mecanismos de censura na Internet com o intuito de impedir o acesso ou publica¸c~aode conte´udoconsiderado sens´ıvel, por parte dos cidad~aos.Apesar disto, a maioria destes regimes permite a opera¸c~aode servi¸cos considerados essenciais, tais como aplica¸c~oes de videoconfer^encia.Este facto motivou o desenvolvimento de ferramentas de comunica¸c~aoresistentes `acensura, que dependem da oculta¸c~aode dados sens´ıveis em protocolos multim´edia,como o Skype, que usam canais de comunica¸c~aocifrados. Uma abordagem promissora, seguida na concep¸c~aodestas ferramentas, ´ebaseada na mod- ula¸c~aode dados ocultos nas tramas de v´ıdeofornecidas `asaplica¸c~oes multim´edia.No entanto, dependendo da fun¸c~aomoduladora e da quantidade de dados a codificar, a utiliza¸c~aodas ferra- mentas geradoras de t´uneisem protocolos multim´edia(FGTPM) pode originar fluxos de rede que diferem de fluxos leg´ıtimosque n~aotransportam dados ocultos. E´ de frisar que estas diferen¸cas podem ser identificadas atrav´esde m´etodos estritamente passivos (por exemplo, atrav´esda ob- serva¸c~aodo comprimento dos pacotes dos fluxos). Assim, um dos principais desafios enfrentados pelas FGTPM consiste em alcan¸carum equil´ıbrioentre o grau de resist^enciaa an´alisede tr´afego e o desempenho oferecido (e.g., atingir d´ebitosuficiente para permitir a navega¸c~aona Internet). Esta tese visa o estudo das possibilidades oferecidas pelas FGTPM no que diz respeito `a gera¸c~aode canais encobertos que oferecem simultaneamente alto d´ebitoe resist^enciaa an´alise de tr´afego.Em primeiro lugar, mostramos que as FGTPM existentes s~aofacilmente detect´aveis. Especificamente, desenvolvemos uma nova metodologia de an´alisede tr´afegobaseada em apren- dizagem autom´atica(AA) capaz de detectar as FGTPM recentes com elevada precis~ao. Em segundo lugar, mostramos que um censor contempor^aneopossui os meios para executar a classi- fica¸c~aode fluxos de rede, baseada em AA, em tempo real. Nomeadamente, estud´amosa utiliza¸c~ao de t´ecnicasde an´alisede tr´afegobaseadas em AA em comutadores program´aveis. Por ´ultimo, introduzimos uma nova t´ecnicapara gerar canais encobertos em fluxos multim´edia,resistentes `aan´alisede tr´afego.A nossa abordagem, baseada na instrumenta¸c~aoda codifica¸c~aode v´ıdeo da plataforma WebRTC, permite a cria¸c~aode canais encobertos com d´ebito elevado atrav´esde fluxos multim´edia cujos padr~oesde tr´afegose assemelham aos de fluxos leg´ıtimos.