Mason2018.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
This thesis has been submitted in fulfilment of the requirements for a postgraduate degree (e.g. PhD, MPhil, DClinPsychol) at the University of Edinburgh. Please note the following terms and conditions of use: This work is protected by copyright and other intellectual property rights, which are retained by the thesis author, unless otherwise stated. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given. The abundance and diversity of endogenous retroviruses in the chicken genome Andrew Stephen Mason Thesis submitted for the degree of Doctor of Philosophy College of Medicine and Veterinary Medicine The Roslin Institute and Royal (Dick) School of Veterinary Studies University of Edinburgh 2017 Acknowledgements The completion of this PhD project and thesis has been the most academically and mentally challenging undertaking I have faced. It is without a doubt that I would have been unable to do so without the encouragement and support of a great many people, who I would like to take this opportunity to thank. Firstly, to my academic supervisors: Professor Dave Burt, Dr Paul Hocking and Dr Sam Lycett. Dave and Paul, thank you for devising this project, taking on a ‘fresh faced’ graduate, and continually supporting and driving my academic development. You’ve both been incredibly supportive of my desire to take on extra activities, whether that be a training course, public engagement or undergraduate teaching, all of which have enriched my experience. You’ve shared a wealth of knowledge and life experience with me, and for that I am grateful. Thank you also to Sam for taking me on during my final year and critically reviewing my work with fresh eyes. You’ve been a great help. Secondly, to my industrial supervisor at Hy-Line International, Dr Janet Fulton. You’ve been a great source of inspiration, opened my eyes to new ideas, but you kept me grounded and focused on the project. Thank you also for so generously hosting me during my month’s stay in Des Moines, and opening your home for games, whisky and sage advice. Thank you also to the entire Molecular Genetics Laboratory group at Hy- Line, particularly Ashlee Lund for her tireless work on the KASP assays. Thirdly, thank you to the collaborators who so generously shared data during my identification of ALVEs: Professors Chris Ashwell, Bernhard Benkel, Marc Eloit, Olivier Hanotte, Susan Lamont, Rudolf Preisinger and Douglas Rhoads. Special thanks are also due to Dr Scott Tyack from the EW group who shared ideas and troubleshooting methods during the development of the ALVE identification pipeline, and to Dr Gregor Gorjanc at Roslin for helping with modelling ALVE detection rates. I have been lucky to part of the Burt lab at Roslin, full of clever and incredibly helpful people. Thank you specifically to Richard Kuo and Dr Lel Eory for dealing with many bioinformatic questions, to Bob Paton and Dr Kasia Miedzinska-Bielecka for help and patience in the lab, and to Dr Jacqueline Smith for good advice and a dose of reality. Beyond great academic support at Roslin, I was fortunate to find a group of people in i the department (and beyond) with whom I shared laughs, cakes and beers during my project. Special mentions to Serap, Graham, Kate, Brendan, Fiona, Carys and John: you guys kept Roslin fun. Thank you for the good memories, and the best of luck to you all. I owe my Mum and Dad a huge thank you for all the love and support over the years. They always drive me to do better, support me through any decision, and even cope with my excited scientific monologuing. You’re both incredible and it’s my constant goal to make you proud. As well as the love and support of my sisters and their families, I have been so lucky during this project to see my family extend to the Hawleys, my brilliant in-laws. Thank you for welcoming me so completely into your lives. Finally, and above all others, I need to acknowledge the love and brilliance of my incredible wife, Cathy. Completing PhDs simultaneously has been a challenge, but we’ve supported each other throughout, and could put a smile on the other’s face after any long day or inconclusive result. You are absolutely everything to me, and I am so fortunate to be spending the rest of my life with you. Here’s to a life-long, post-PhD adventure, and all the amazing things to come! Chicken Lookin’ by Alex Hawley My very own motivational poster. ii Declaration I declare that this thesis has been composed solely by myself, and that it has not been submitted, in whole or in part, in any previous application for a degree or other professional qualification. Except where stated otherwise by reference or acknowledgement, the work presented is entirely my own. Andrew Mason iii iv Abstract Long terminal repeat (LTR) retrotransposons are autonomous eukaryotic repetitive elements which may elicit prolonged genomic and immunological stress on their host organism. LTR retrotransposons comprise approximately 10 % of the mammalian genome, but previous work identified only 1.35 % of the chicken genome as LTR retrotransposon sequence. This deficit appears inconsistent across birds, as studied Neoaves have contents comparable with mammals, although all birds contain only one LTR retrotransposon class: endogenous retroviruses (ERVs). One group of chicken- specific ERVs (Avian Leukosis Virus subgroup E; ALVEs) remain active and have been linked to commercially detrimental phenotypes, such as reduced lifetime egg count, but their full diversity and range of phenotypic effects are poorly understood. A novel identification pipeline, LocaTR, was developed to identify LTR retrotransposon sequences in the chicken genome. This enabled the annotation of 3.01 % of the genome, including 1,073 structurally intact elements with replicative potential. Elements were depleted within coding regions, and over 40 % of intact elements were found in clusters in gene sparse, poorly recombining regions. RNAseq analysis showed that elements were generally not expressed, but intact transcripts were identified in four cases, supporting the potential for viral recombination and retrotransposition of non-autonomous repeats. LocaTR analysis of seventy-two additional sauropsid genomes revealed highly lineage- specific repeat content, and did not support the proposed deficit in Galliformes. A second, novel bioinformatic pipeline was constructed to identify ALVE insertions in whole genome resequencing data and was applied to eight elite layer lines from Hy-Line International. Twenty ALVEs were identified and diagnostic assays were developed to validate the bioinformatic approach. Each ALVE was sequenced and characterised, with many exhibiting high structural intactness. In addition, a K locus revertant line was identified due to the unexpected presence of ALVE21, confirmed using BioNano optic maps. The ALVE identification pipeline was then applied to ninety chicken lines and 322 different ALVEs were identified, 81 % of which were novel. Overall, broilers and non-commercial chickens had a greater number of ALVEs than were found in layers. Taken together, these two analyses have enabled a thorough characterisation of both the abundance and diversity of chicken ERVs. v vi Lay Summary Retroviruses are a group of viruses which are unable to replicate themselves. Instead, during infection, these viruses insert their own genetic material into host cell DNA (the genome) where it is then replicated, producing more virus. Depending on the insertion site, these viruses can disrupt host gene function, which can lead to cancer. Furthermore, these insertions can elicit generational effects. If a retrovirus inserts within the DNA of a sperm or egg cell, the retrovirus will be passed on to the next generation as an endogenous retrovirus (ERV), and will be present in every cell of the offspring. In mammals, approximately 10 % of the genome consists of ERVs. However, in the chicken only 1.35 % of genome is derived from ERVs, even though other birds have levels equivalent to mammals when differences in genome size are considered. Despite this apparent deficit of ERVs, a chicken-specific ERV group, the ALVEs, is known to induce tumours and inhibit productivity in commercial flocks. This PhD project sought to perform a better annotation of ERVs in the chicken and other birds, characterise ALVEs in commercial flocks, and more fully identify ALVE diversity across chickens. A new annotation pipeline, LocaTR, was developed to identify ERVs in genome sequence. This pipeline was used to annotate ERVs in sixty-seven bird species, including chicken, and six reptile outgroups. The previously identified ERV content in chickens was almost doubled and the lineage analysis did not support the previously proposed deficit of ERVs in the chicken compared to other birds. Chicken ERV distribution and intactness was assessed to predict their effects on the host. Most ERVs were ancient and highly degraded, but 1,073 structurally intact ERVs were identified. A further ALVE identification pipeline was developed to identify novel ALVE insertions in eight elite layer lines from Hy-Line International. Twenty different ALVEs were identified, diagnostic tests were developed, and each insert was sequenced and characterised. Many of the inserts remain highly intact supporting a continued influence on chicken biology. Analysis of ninety additional datasets identified over three hundred ALVEs, 81 % of which were novel to this study. This PhD project has enabled a thorough characterisation of chicken ERVs, and the two pipelines are applicable to other research, such as viral-induced cancers in humans.