Cuban-­‐Amazon genome annotaon Student: Sofiia Kolchanova Scienfic advisor: Pavel Dobrynin Introducon • Amazona leucocephala has 5 with populaons distributed across , Bahamas and . They are characterized by different habitats, diets, unique coloraon paerns. • Due to habitat loss, and trapping for the wild trade, the Cuban Amazon is now an endangered • Genome annotaon provides informaon, which can be employed for populaon studies, phylogenecs, studies of evoluon and funconing of genes and gene families Main goal and objecves • The main goal of this project is to annotate the already assembled genome of cuban parrot (Dobrynin, P., Rivera, I., and Oleksyk T.K. Sequencing, assembly and comparave genomics analysis of Cuban genome (Amazona leucocephala)) • Objecves: 1) Annotate repeats using different repeat masking tools 2) Annotate genes in the whole genome (based on homology and de novo, find core genes with CEGMA (Core Eukaryoc Genes Mapping Approach) 3) Annotate SPNs and predict their possible effects (e.g. missense, nonsense) Assembly Annotaon of repeats

• Repeat Masker, Tandem Repeat Masker, python TRF: 12,1 % of seq the genome masked RM: 9,7 % masked

Аннотация генов

1. Annotaon using homologues from the reference genome Tools: BLASTN, Splign, python scripts (Dobby-­‐tools J), Bedtools. Total: 8403 genes annotated with the ORF correcon: 4984 genes

2. De novo search for genes (AUGUSTUS): 23870 genes predicted Annotaon of genes CEGMA result: 205 conservave , genes of those 126 are complete (not paral) (of the dataset of 458 core eukaryoc genes) Intersecon:

CEGMA & AUGUSTUS 97,3 %

CEGMA Splign & 98,1 %

Splign & Augustus 30,1 % SNPs annotaon SNPs annotaon SNPeff ( ) SNPs annotaon SNPeff ( ) SNPs annotaon SNPeff ( )

Coverage SNPs annotaon SNPeff ( ) RESULTS • Repeats annotated (simple repeats, microsatellites, DNA transposones, retroelements etc.) in the genome of cuban parrot; on the average repeats constute about 10 % of the genomic sequence. • De novo search for genes performed (predicon), as well as homology-­‐based annotaon and verificaon employing the dataset of core eukaryoc genes (CEGMA). • SNPs annotated, alongside with predicon of their possible effects on known genes (like amino acid substuons, gain of stop codon) Plans

• Make the annotaon more precise • Place the acquired genes against gene families from TreeFam • Expand annotaon of repeats • Assess demographic history with use of PSMC