High-Throughput Sequencing Data Compression

HIGH-THROUGHPUT SEQUENCING DATA COMPRESSION Łukasz Roguski TESI DOCTORAL UPF / 2017 DIRECTORS DE LA TESI Dr. Paolo Ribeca Dr. Sebastian Deorowicz DEPARTAMENT DE CIÈNCIES EXPERIMENTALS I DE LA SALUT The power for creating a better future is contained in the present moment: You create a good future by creating a good present. — Eckhart Tolle ACKNOWLEDGEMENTS First and foremost, I offer sincerely my gratitude to my supervisors Dr. Sebastian Deorowicz, from the Silesian University of Technology, and Dr. Paolo Ribeca, from the Pirbright Institute. Sebastian Deorowicz is an excellent researcher and professor who I met during undergraduate studies – his expertise has been a guidance during my adventure with developing methods for processing large volumes of data. His vast knowledge in the area, advice and encouragement have been essential during completing my master thesis and we have continued our fruitful collaboration on this work. I greatly appreciate his support and patience, always being able to count on him. Paolo Ribeca is a talented researcher and a brilliant mind who I have been fortunate to work with, starting our collaboration as an intern and extend- ing it to a long research stay at Centro Nacional de Análisis Genómico (CNAG) in Barcelona. His vast expertise in the area, advice, support, and thinking “outside the box”, helped me to tackle the problems in bioinformatics field from different perspectives and helped me to understand the challenges of working with genomic data. I also enjoyed a lot crunching challenging problems together on the blackboard with discussions full of surreal sense of humor. Furthermore, I am grateful to my tutor Roderic Guigó from Universitat Pompeu Fabra (UPF) and Centre for Genomic Regulation (CRG), who is an excellent researcher and professor. He has been always welcoming, helpful, motivating to look for answers for the large number of questions in the field of biology (and challenges in the field of bioinformatics), and encouraging me to pursue my goals in scientific career. i Acknowledgements ii I also would like to acknowledge all the organizations, teams and people I have collaborated with. First of all, my research stay at CNAG has been of a great value. It allowed me to get insight into a vast number of bio- logical concepts and processes, which we try to understand empowered with sequencing technologies and using computational methods from the bioinformatics field. This would not be possible without Ivo Gut, the di- rector of CNAG, to who I am grateful for supporting my research, projects and for valuable advices during all these years. Furthermore, I would like to thank Andrzej Polański and Joanna Polańska from the Silesian University of Technology, who encouraged and supported my initial short-term stay at CNAG under the project INTERKADRA. Col- laboration with Mikel Hernaez and Idoia Ochoa from the University of Illinois has been also a very valuable experience within our new project. I have learned a lot working together and I greatly appreciate our collaboration. I would also like to thank David Man for his comments and help on correcting English grammar and style of this thesis (excluding this part, which was added at the same end). While doing my research at CNAG, I have met and worked with fantastic people – I am grateful to all my close colleagues for these positive experiences. First of all, I would like to thank Leonor, one of the most skilled algorithmicians, a scientist, a dancer, and a volleyball player, who has been always supportive, encouraging and helpful; she is an exceptional person. I thank David, who has been a greats companion from the beginning, showing that bioinformatics and karate actually have a lot of things in common – together with Anna P.,who I am also happy to have met, we spent great time working together and exploring Barcelona after work. Moreover, I would like to thank Marcos for fruitful discussions, for his help and support both in solving various bioinformatic-related problems and in winning volleyball matches. My thanks also go to Enric, a passionate bioinformatician, with who we shared a lot of outstanding moments, not only crunching problems in the world of bioinformatics. I would also like to thank Emilio who I shared with a significant numbers of coffees – with his wife Giulia (a big ‘thank you’ to you too), they have been always supportive and helpful. I thank François, a true hacker in bioinformatics, for his support and fruitful, inspiring discussions. My thanks goes also to my PhD colleague Santiago, a very skilled scientist, from who I have iii Acknowledgements learned a lot. Thank you Miranda for always sharing a lot of positive energy, not allowing to stay down or bored (especially during Spanish classes). Furthermore, I would like to thank my colleagues Fran, Marysia, Elisabetta, Thasso, Tyler, Jéssica, Jordi, Anna E., Nando, Sophia, Raúl, Pablo, Miguel, Justin, Davide, Irene, with who we shared a lot of worthwhile conversations, coffees, breakfasts, lunches, and/or drinks. On a personal level, I would like to express my sincere gratitude to my close friends from my hometown, Gliwice, who have supported me during this PhD journey in Barcelona. They have been always present, despite the distance, both in the easy and difficult times, awaiting for the moment I finally close the thesis; they were also checking my progress in person, from time to time, while in Barcelona. Especially, I would like to thank Marcin, who has always been to me like “a brother, but from another mother and father”. I thank Dominika, with who we could always understand each other without words. I thank Tomek for his enthusiasm and creativity, never being able to get bored together. Furthermore, I thank Ania & Wojtek, Tobek, and Marta, as I could always count on you, sharing valuable moments together independently on the place and the distance. I would also like to thank Agnieszka, Hiszpan, Monika, and Hania, for always remembering and being around. Moreover, I am grateful to all my flatmates (and friends) who we shared together with unforgettable moments (apart from the flat), as they were also indirectly involved in this PhD journey almost every day. Firstly, I thank Peter for sharing good vibes while sharing the flat in Barceloneta. This journey, however, would not be so unique without Doron & Kamila, who gave me a sense of being my first family abroad – this was an amazing time in Gracia, thank you so much. I am also very grateful to Yannick for sharing a great time and part of life in Sants, both in our cozy self- assembled flat and in the bodega. Together with Yannick & Francesca, Leo & Marión, Ania & Stefan, Jeroen, Anna L., Fede, Davide, we have collected a lot of unforgettable memories, while living in Sants – thanks for being great people and thank you for making me feel in Barcelona “at home”. Furthermore, my sincere gratitude to my other close friends in Barcelona, sharing a lot both of positive and difficult moments together, who were always supporting me, despite me being (definitely too often) busy, work- Acknowledgements iv ing or “finishing” the thesis. First of all, I would like to especially thank Ewelina, who I met almost at the beginning of the journey and who has been was always close to me – our trips with Xavi, Pedro or Paola visiting Costa Brava will remain as great memories, among many others. Moreover, I would like to thank Damian for staying always creative, spontaneous, and not politically correct, guaranteeing having a lot of fun together. My big thanks goes to Doris & Remek, Olek & Mireia, for being always warm, welcoming people, and sharing a lot of positive energy (and some drinks). Many thanks to Oriol, for long and inspiring discussions; thanks a lot also for multiple hikes in the mountains and skiing together in the Pyrenees. Thank you Laia & Gerard for incredible hikes in the mountains together and sharing great moments outside Barcelona (and, of course, also in the city). Loli & Joan, thank you for always encouraging to staying creative and for experiencing together the Sónar. Thank you Sandie for your constant encouragements too, apart from the fantastic French crêpes you make. Many thanks go also to Ignasi, Solmaz, Edu and Pere, for sharing a lot of good moments and interesting discussions. Finally, I am very grateful for my little family, for their indefinite support, presence, patience, and understanding – you have been always with me during this journey, despite me often being absent and fully focused on the thesis. I truly cannot express how grateful I am for my beloved Marina, who has always found positive energy to share, to smile, and to cheer up. We’ve been both going through the tough experiences together, but always finding the way to stay happy, and to enjoy even the little positive moments and successes on our way – and with more positive and great moments very soon to come! I am also very grateful to my parents, who supported me as much as they could both on the distance and while visiting me many times in Barcelona. My brother has been always an inspiration for me, his creativity, strength and stubbornness have been reminding me to keep going, despite the difficulties; thanks a lot “byczku”. Moreover, my sincere gratitude to Montserrat & Francesc for their positive energy and always cheering up. Finally, I would like to thank my uncle Kazik, Sumsi & Francesc for their kind support. Without all of you, I would not have been able to arrive up to here. Thank you. ABSTRACT Thanks to advances in sequencing technologies, biomedical research has experienced a revolution over recent years, resulting in an explosion in the amount of genomic data being generated worldwide.

High-Throughput Sequencing Data Compression

Contrasting the Performance of Compression Algorithms on Genomic Data

Schematic Entry

Encryption Introduction to Using 7-Zip

Pack, Encrypt, Authenticate Document Revision: 2021 05 02

Metadefender Core V4.12.2

Data Compression for Remote Laboratories

Error Correction Capacity of Unary Coding

Improved Neural Network Based General-Purpose Lossless Compression Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa

Lossless Compression of Internal Files in Parallel Reservoir Simulation

Image Data Compression Introduction to Coding

Arrow: Integration to 'Apache' 'Arrow'

Summer 2010 PPAXAXCENTURIONCENTURION Boston Police Patrolmen’S Association, Inc