Combining Approaches for Predicting Genomic Evolution Bassam Alkindy
Total Page:16
File Type:pdf, Size:1020Kb
Combining approaches for predicting genomic evolution Bassam Alkindy To cite this version: Bassam Alkindy. Combining approaches for predicting genomic evolution. Bioinformatics [q-bio.QM]. Université de Franche-Comté, 2015. English. NNT : 2015BESA2012. tel-01428885 HAL Id: tel-01428885 https://tel.archives-ouvertes.fr/tel-01428885 Submitted on 6 Jan 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ; /9=!9=7:"3:#3 é c o l e d o c t o r a l e s c i e n c e s p o u r l ’ i n g é n i e u r e t m i c r o t e c h n i q u e s % 0 2 $ < 1 & 2 ; 6 7 < * 1 + 0 8 ) < , 8 ' ( ; 6 Combining Approaches for Predicting Genomic Evolution Combinaison d’Approches pour Résoudre le Problème du Réarrangement de Génomes BASSAM BASIM JAMIL ALKINDY ; /9!97:"3:#3 é c o l e d o c t o r a l e s c i e n c e s p o u r l ’ i n g é n i e u r e t m i c r o t e c h n i q u e s % 0 2 $ < 1 & 2 ; 6 7 < * 1 + 0 8 ) < , 8 ' ( ; 6 N◦ X X X Combining Approaches for Predicting Genomic Evolution Combinaison d’Approches pour Résoudre le Problème du Réarrangement de Génomes A dissertation presented by BASSAM BASIM JAMIL ALKINDY and submitted to the University of Franche-Comté in partial fulfillment of the Requirements for obtaining the degree DOCTOR OF PHYLOSOPHY in speciality of Computer Science Research Unit : Laboratory of Femto-ST (SPIM) Defended in public on 17 December 2015 in front of the Jury composed from : LHASSANE IDOUMGHAR President of jury Professor, University of Haute-Alsace JEAN-PAUL COMET Reviewer Professor, University of Nice STÉPHANE CHRÉTIEN Reviewer Senior Researcher (HDR), National Physical Laboratory Mathematics, Modelling, and Simulation, UK CHRISTOPHE GUYEUX Examiner Professor, University of Franche-Comté JACQUES M. BAHI Supervisor Professor, University of Franche-Comté JEAN-FRANÇOIS COUCHOT Co-Supervisor MCF, University of Franche-Comté MICHEL SALOMON Co-Supervisor MCF, University of Franche-Comté Acknowledgement Following this work, I want to express my gratitude to allow all people who have con- tributed, each has its method, at the completion of this thesis. I want to express my deepest thanks to my supervisor Prof. Jacques M. Bahi and to my co-supervisors Dr. Jean-Francois Couchot and Dr. Michel Salomon. Words are broken me to express my gratitude. Their skills, their scientific rigidity, and clairvoyance taught me a lot. Indeed, I thank them for their organizations, and expert advice provide me they knew through- out these three years and also for their warm quality human, and in particular for the confidence they have granted to me. I would nevertheless like to thank more particularly proudly Prof. Christophe Guyeux, professor in the University of Franche-Comté, for his advisers and his scientific expert who helped and guided me a lot for the completion of this thesis. I would like to extend my sincere thanks to Lhassane Idoumghar, Professor at the Univer- sity of Haute-Alsace, for giving me the honor to president the jury. I also extend my sincere thanks to Jean-Paul Comet, Professor at the University of Nice and Stéphane Chrétien, MCF HDR, National Physical Laboratory Mathematics, Modelling, and Simulation, UK, for giving me the honor of accepting to be rapporteurs of this thesis. I would like to extend my sincere thank to the examiners: Christophe Guyeux, Professor at the University of Franche-Comté and Jean-Francois Couchot, MCF, University of Franche-Comté. I extend my warmest thanks to the Minister of Higher Education and Scientific Research in Iraq represented by Campus France, the University of Mustansiriyah, and the University of Franche-Comté, for their co-operation to finance this thesis. My gratitude and my thanks go to the crew members of AND for the friendly and warm atmosphere in which it supported me to work. Therefore thanks to Raphaël Couturier, Karine Deschinkel, Stephane Domas, Giersch Arnaud, Mourad Hakem, Ali Idrees Kad- hum, Ahmed Al-Badri, David Laiyamani, Yousra Ahmed Fadil, Abdallah Makhoul, Roxane Mallouhy, Ahmed Mostefaoui, and to whom I missed their names. My gratitude and my thanks go to the crew members of DISC for the friendly and warm atmosphere in which it supported me to work. Therefore, Thanks Olga Kouchnarenko, di- rector of DISC (Informatique des systèmes complexes) department in Besançon, Pierre- Cyrille Héam, Dominique Menetrier, Jean-Michel Caricand, Laurent Steck and to all other people if I forget his name. For their unwavering support and encouragement. I would also like express my strongly thanks to the crew of super-computer facili- ties (Mesocentre) for their generous advices and help in launching the calculations using supercomputer capabilities by installing the modules, creation the site Internet that make dreams come true. Therefore, thanks to Laurent Philippe, Kamel Mazouzi, Guillaume v vi Laville, and Cédric Clerget. I would also like express my thanks to my friends in bioinformatics team of Christophe for their kindly friendship, Thanks, Huda AL-NAYYEF, Bashar Al-Nauimi, Panisa Treep- ong. Before closing, I want to thank my dear friends including Lilia Ziane Khodja, Abbas Abdulhameed, Hamida Bouaziz, Hana M’Hemdi, Lemia Louail, Kitsiri Kizzyy Chochiang, who shared my hopes and studies, which made me comfort in the difficult moments and with whom I shared unforgettable moments of events. Dedication To my wife Huda, with my love. I am also addressing the strongest thanksgiving words to my parents, my wife’s parents, my sisters and their families, my brothers and their families, and to my lovely family, for their support and encouragement during the thesis in long years of studies. Their affection and trust lead me and guide me every day. Thank you, Mom, Dad, for making me what I am today. vii Abstract Chloroplasts is one of many types of organelles in the plant cell. They are considered to have originated from cyanobacteria through endosymbiosis, when an eukaryotic cell en- gulfed a photosynthesizing cyanobacterium, which remained and became a permanent resident in the cell. The term of chloroplast comes from the combination of plastid and chloro, meaning that it is an organelle found in plant cells that contains the chlorophyll. Chloroplast has the ability to convert water, light energy, and carbon dioxide (CO2) in chemical energy by using carbon-fixation cycle (also called Calvin Cycle, the whole pro- cess being called photosynthesis). This key role explains why chloroplasts are at the ba- sis of most trophic chains and are thus responsible for evolution and speciation. Moreover, as photosynthetic organisms release atmospheric oxygen when converting light energy in chemical one, and simultaneously produce organic molecules from carbon dioxide, they originated the breathable air and represent a mid to long term carbon storage medium. Consequently, exploring the evolutionary history of chloroplasts is of great interest, and we propose to investigate it by the mean of ancestral genomes reconstruction. This re- construction will be achieved in order to discover how the molecules have evolved over time, at which rate, and to determine whether evidences of their cyanobacteria origin can be presented by this way. This long-term objective necessitates numerous inter- mediate research advances. Among other things, it supposes to be able to apply the ancestral reconstruction on a well-supported phylogenetic tree of a representative collec- tion of chloroplastic genomes. Indeed, sister relationship of two species must be clearly established before trying to reconstruct their ancestor. Additionally, it implies to be able to detect content evolution (modification of genomes like gene loss and gain) along this accurate tree. In other words, gene content evolution on the one hand, and accurate phylogenetic inference on the other hand, must be carefully regarded in the specific case of chloroplast sequences, as the two main prerequisites in our quest of the last universal common ancestor of these chloroplasts. In detail, given a collection of genomes, it is possible to define their core genes as the common genes that are shared among all the species, while pan genome is all the genes that are present at least once (all the species have each core gene, while a pan gene is in at least one genome). The key idea behind identifying core and pan genes is to understand the evolutionary process among a given set of species: the common part (that is, the core genome) can be used when inferring the phylogenetic relationship, while accessory genes of pan genome explain to some extent each species specificity. In the case of chloroplasts, an important category of genome modification is indeed the loss of functional genes, either because they become ineffective or due to a transfer to the nucleus. Thereby a small number of gene loss among species may indicate that these species are close to each other and belong to a similar lineage, while a large loss means ix x distant lineages. More precisely, a key idea concerning phylogenetic classification is that a given DNA mu- tation shared by at least two taxa has a larger probability to be inherited from a common ancestor than to have occurred independently.