Cuban-‐Amazon genome annota on Student: Sofiia Kolchanova Scien fic advisor: Pavel Dobrynin Introduc on • Amazona leucocephala has 5 subspecies with popula ons distributed across Cuba, Bahamas and Cayman islands. They are characterized by different habitats, diets, unique plumage colora on pa erns. • Due to habitat loss, and trapping for the wild parrot trade, the Cuban Amazon is now an endangered species • Genome annota on provides informa on, which can be employed for popula on studies, phylogene cs, studies of evolu on and func oning of genes and gene families Main goal and objec ves • The main goal of this project is to annotate the already assembled genome of cuban parrot (Dobrynin, P., Rivera, I., and Oleksyk T.K. Sequencing, assembly and compara ve genomics analysis of Cuban amazon parrot genome (Amazona leucocephala)) • Objec ves: 1) Annotate repeats using different repeat masking tools 2) Annotate genes in the whole genome (based on homology and de novo, find core genes with CEGMA (Core Eukaryo c Genes Mapping Approach) 3) Annotate SPNs and predict their possible effects (e.g. missense, nonsense) Assembly Annota on of repeats
• Repeat Masker, Tandem Repeat Masker, python TRF: 12,1 % of seq the genome masked RM: 9,7 % masked
Аннотация генов
1. Annota on using homologues from the reference genome Tools: BLASTN, Splign, python scripts (Dobby-‐tools J), Bedtools. Total: 8403 genes annotated with the ORF correc on: 4984 genes
2. De novo search for genes (AUGUSTUS): 23870 genes predicted Annota on of genes CEGMA result: 205 conserva ve , genes of those 126 are complete (not par al) (of the dataset of 458 core eukaryo c genes) Intersec on:
CEGMA & AUGUSTUS 97,3 %
CEGMA Splign & 98,1 %
Splign & Augustus 30,1 % SNPs annota on SNPs annota on SNPeff ( ) SNPs annota on SNPeff ( ) SNPs annota on SNPeff ( )
Coverage SNPs annota on SNPeff ( ) RESULTS • Repeats annotated (simple repeats, microsatellites, DNA transposones, retroelements etc.) in the genome of cuban parrot; on the average repeats cons tute about 10 % of the genomic sequence. • De novo search for genes performed (predic on), as well as homology-‐based annota on and verifica on employing the dataset of core eukaryo c genes (CEGMA). • SNPs annotated, alongside with predic on of their possible effects on known genes (like amino acid subs tu ons, gain of stop codon) Plans
• Make the annota on more precise • Place the acquired genes against gene families from TreeFam • Expand annota on of repeats • Assess demographic history with use of PSMC