Characterization of Histone-Like Protein Gene Families In
Total Page:16
File Type:pdf, Size:1020Kb
CHARACTERIZATION OF HISTONE-LIKE PROTEIN GENE FAMILIES IN FRITILLARIA LILIACEA A University Thesis Presented to the Faculty of California State University, Hayward In Partial Fulfillment ofthe Requirements for the Degree Master ofScience in Biological Science By Rishali Gadkari March, 2002 Abstract The genome size ofplants shows a thousand fold variation from the 150 Mb genome ofArabidopsis to the 140,000 Mb genome ofFritillaria. Much ofthis variation in genome size is due to changes in the amount ofnon-coding DNA. Genome complexity, in contrast to genome size, is determined in large part by the number ofgenes but gene number and family organization are poorly characterized in most organisms. In this study I have characterized the number and type offamily members from the histone-like protein gene family in Fritillaria liliacea. The experimental approach involves exhaustive polymerase chain reaction (peR) amplification followed by cloning. Selected clones were sequenced and phylogenetic analyses were performed to determine the relationship among family members. The histone-like gene family in Fritillaria liliacea consists oftwo functional genes, which may be allelic and 12 processed pseudogenes. In contrast the histone-like protein gene family in Arabidopsis consists ofonly one functional gene without any pseudogenes. During the course ofthis study, a new histone like protein related gene family was identified. This gene family consists of5-8 functional genes and 4 classical pseudogenes. 11 CHARACTERIZATION OF HISTONE-LIKE PROTEIN GENE FAMILIES IN FRlT1LLARlA LILIACEA By Rishali Gadkari Date: I~'J I J r , , VCJVCj, 'dOO \ I l1l Acknowledgments My sincere thanks to Dr. Chris Baysdorfer for the inspiration and guidance that he has given me to do this research work. I am grateful to Dr. Kelly Steele for helping me with the phylogenetic analysis ofmy data and Dr. Laura Marschall for her guidance. I also want to thank my family for their support and encouragement. IV Table of contents Page 1. List ofTables '" vi 2. List ofFigures vii 3. Introduction 1 General Introduction 1 Gene Families '" 2 Pseudogenes 6 Experimental Organism 18 Histone-like Protein Gene Family 19 Experimental Question 20 4. Material and Methods '" '" 21 5. Results 29 6. Data Analysis 34 7. Discussion 93 8. Reference 99 9. Appendix 112 Raw Sequence Data 112 BLAST N database search ofhistone-like protein genomic clones from Fritillaria liliacea 166 BLAST N database search with processed pseudogene clones from Fritillaria liliacea 169 Intron Exon Structure ofhistone-like gene and new histone-like gene from Fritillaria liliacea 171 v List ofTables Page 1. Pseudogene composition ofvarious gene families from different plants and animals 9 2. PCR primers used in the analysis 22 3. Comparison ofhistone-like protein 54 VI List ofFigures Page 1. Initial PCR amplification ofFritillaria liliacea genomic DNA 30 2. PCR amplification ofFritillaria liliacea genomic DNA at lower annealing temperature 31 3. PCR Amplification ofFritillaria liliacea genomic DNA with initial and new histone-like gene specific primers 33 4. Graphical overview ofnucleotide alignment in BLAST N database search from histone-like protein genomic clones from Fritillaria liliacea and histone-like protein cDNA from Fritillaria agrestis 35 5. Graphical overview ofnucleotide alignment in BLAST N database search from histone-like protein processed pseudogenes clones from Fritillaria liliacea and histone-like protein cDNA from Fritillaria agrestis 36 6. BLAST P results ofhistone-like protein gene ofFritillaria liliacea 36 7. Translation ofthe group-l pseudogenes clones from Fritillaria liliacea ...... 39 8. Translations ofthe group 2 and 3 pseudogenes clones from Fritillaria liliacea 39 9. Multiple alignment ofboth genomic clones ofFritillaria liliacea 40 10. Multiple alignment ofgenomic-l clones ofFritillaria liliacea 42 11. Multiple alignment ofgenomic-2 clones ofFritillaria liliacea 46 12. Multiple alignment ofhistone-like protein from Fritillaria liliacea, Fritillaria agrestis and Lilium longiflorum 56 13. Multiple alignment ofhistone-like protein Group 1 pseudogenes clones in Fritillaria liliacea '" '" '" 58 14. Multiple alignment ofhistone-like protein group 2 pseudogenes clones in Fritillaria liliacea 61 VB 15. Multiple alignment ofhistone-like protein group 3 pseudogenes clones in Fritillaria liliacea 65 16. Multiple alignment ofgroup 1 group 2 and group 3 pseudogenes 66 17. Phylogenetic tree ofhistone-like protein gene family from Fritillaria liliacea 74 18. BLASTN database search ofnew histone-like protein from Fritillaria liliacea 76 19. Graphical overview ofhistone-like protein gene and new histone-like protein gene alignment From Fritillaria liliacea 77 20. Multiple alignment ofnew histone-like protein 78 21. Multiple alignment ofnew histone-like group 1 clones from Fritillaria liliacea 80 22. Multiple alignment ofnew histone-like group 2 clones from Fritillaria liliacea 82 23. Multiple alignment ofnew histone-like protein group 3 clones from Fritillaria liliacea 84 24. Multiple alignment ofnew histone-like gene family 86 25. Phylogenetic tree ofnew histone-like protein gene family from Fritillaria liliacea 92 26. Intron exon structure ofhistone-like protein cDNA sequence of Fritillaria agrestis and Arabidopsis putative histone gene sequence 95 Vlll 1 General Introduction The genome size ofeukaryotes shows a tenthousand-fold variation from 12-140,000 Mb. The complexity ofthe genome is probably not due to the number ofgenes but to the amount ofnon-coding DNA. For example, yeast genome size is 12 Mb and 70 % ofthe genome is encode proteins. In contrast, the genome size ofthe lungfish is 140,000 Mb and the only 0.4-1.2 % ofthe total genome encode proteins (Cavalier-Smith, 1985). In plants, genome size shows a 1000-fold variation between species (Joseph, 1990). For example, the genome size ofArabidopsis thaliana is about 150 Mb with a coding region of31 %. However, the genome size ofFritillaria is 140,000 Mb with coding region of only 0.02 %. (Cavalier-Smith, 1985). Fritillaria belongs to the family Liliaceae, which includes several other genera with large genomes. These include Tulipa, which has a genome size ofabout 12,000 - 22,000 Mb and Lilium with a genome size of 30-40,000 Mb (Royal Botanic gardens, Kew Angiosperm DNA C-Value database-http://www.rbgkew.org.uk). These large genomes mainly consist ofnon-coding DNA, including tandemly and dispersed repeats (Joseph, 1990). The tandem repeats include satellite DNA and rRNA genes, while dispersed repeats mainly consist oftransposable elements. In Lilium henryi there are more than 13000 copies ofthe transposable element dell (a LTR-retrotransposon) that make up about 0.4 % ofthe genome, while in Lilium longiflorum, 1 % ofthe genome consists of dell elements (Smyth, 1991 & Joseph, 1990). Another transposable element del2 element, (a non-LTR retrotransposons) accounts for 4 % ofthe Lilium speciosum genome 2 and is present in approximately 250,000 copies. Although 4 % is a small value compared to the large genome size ofLilium, this sequence is equivalent to ten copies ofthe A. thaliana genome (Smyth, 1991 & Leeton, 1993). These examples show that repeated sequences, mainly retrotransposons, can be responsible for increase in the genome size of the family Liliaceae. Gene Families As mentioned above, wide differences in genome size are probably primarily due to changes in the amount ofrepetitive DNA with changes in gene number being a less important factor in determining genome size. Nonetheless, gene numbers do vary somewhat within major taxonomic groups. An increase in gene number is, however one ofthe major factors influencing genome complexity. Three different mechanisms have been proposed for increasing gene numbers (Holland, 1999). 1. Gene Duplication: Duplication ofan ancestral gene can arise by unequal crossover during recombination in a germ cell precursor or by replication slippage. Tandem gene duplication creates gene clusters but translocations break such clusters and scatter their constituent genes around the genome. For example, the beta globin and Hox gene clusters have remained intact in many evolutionary lineages. Conversely the RUNX (Runt-related transcription factor 1) gene family encoding human leukemia associated transcription factor consists ofthree genes dispersed on different chromosomes. Members ofthis family have 'runt domain' 3 (RD), which directs the binding ofRUNX transcription factors to the consensus sequence oftarget genes. RUNXI is located on human Chr 21, RUNX2 on human Chr 6 and RNUX3 on human Chr 1. (Levanon, 2001). 2. Whole genome duplication (polyploidy): Polyploidy events have clearly played a role in the evolution ofsome eukaryotic genomes, such as maize and yeast (Shimeld, 1999). Polyploidy is common in plants, but rare in animals. Two types ofpolyploidy events are described, autopolyploidy and allopolyploidy. Autopolyploidy occurs within one species by chromosomal doubling, and can be seen in tetraploid cranberry, Vaccinium OXYcoccos (Mahy, 2000) and Gooseberry leaved alumroot, Heuchera grossulariifolia (Segraves, 1999). Allopolyploidy is the result offusion between parents ofdifferent species and may arise from the fusion ofabnonnal gametes or from fusion ofnonnal haploid gametes followed by chromosomal doubling. Examples ofallopolyploidy are cotton, Gossypium hirsutum (Cronn, 1999) and wheat, Triticum aestivum (Leitch, 1997). 3. Retrotransposition: This is a mechanism where processed RNA is reverse transcribed