Summer internship, 2019

Searching of potential markers of pregnancy complications using sequencing data of circulating cfDNA from maternal plasma

Supervisors: Intern: A.S.Glotov Alisa Morshneva P.Y.Kozyulina cfDNA

Circulating free DNA (cfDNA) are degraded DNA fragments released to the blood plasma.

Blood cells apoptosis

Fat tissue Tumors necrosis secretion Solid organs

cell-free DNA ( ~150 bp ) Non-invasive prenatal test (NIPT)

Blood Fetal sampling DNA

sequencing of cfDNA from plasma Maternal cfDNA detection of aneuploidies

3 Image: www.niftytest.com Dataset

number of samples: > 450

age: 22 - 47 Pathology Samples

gestation age (weeks): 10 - 21 HIV+ 2

mioma 1 platform: Ion Torrent T13 3 format: bam T18 6 reference: hg19 T21 20 average coverage: 20% X0 1 depth: 1 - 2 4 Focus areas

check the samples for the presence of viral sequences

SNPs Preliminary screening (CLINVAR) cfDNA

maternal + fetal mtDNA ChrM/NuclearDNA ratio

cancer fetal DNA is indistinguishable from tumor DNA

5 viruses

Task: check cfDNA data for viral reads

Snakemake pipeline:

raw extract unmapped parse report with viral KRAKEN2 BAMs unmapped reads (fastq) representation

Samtools, bedtools

6 Top-15 the most represented viruses

Patients N = 446 252 Tequatrovirus (taxid 10663) 223 Tevenvirinae (taxid 1198136) 179 (taxid 28883) 116 Escherichia phage TL-2011b (taxid 1124654) 80 Pahexavirus (taxid 1982251) 65 Enterobacteria phage YYZ-2008 (taxid 564886) 56 Stx2-converting phage 1717 (taxid 563769) can be used for 33 Enterobacteria phage VT2phi_272 (taxid 936054) contamination 27 Enterobacteria phage SfI (taxid 1225789) tracking 25 Human betaherpesvirus 7 (taxid 10372) 23 Salmonella phage RE-2010 (taxid 929814) 15 Skunavirus (taxid 1623305) 14 Enterobacteria phage mEp460 (taxid 1147152) 13 Thermus IN93 (taxid 1714273) 12 Escherichia virus T4 (taxid 10665) 7

Human viruses

Taxon Patients Percentage Viral reads

Human betaherpesvirus 7 (taxid 10372) 25 5.61% 1 - 3 Human betaherpesvirus 5 (taxid 10359) = CMV 9 2.02% 1 - 3 Human gammaherpesvirus 4 (taxid 10376) 6 1.35% 1 - 2 Erythroparvovirus (taxid 40121) 5 1.12% 1 - 2 Human parvovirus B19 (taxid 10798) 3 0.67% 1 - 6 Human betaherpesvirus 6A (taxid 32603) 3 0.67% 1 - 6 Human gammaherpesvirus 8 (taxid 37296) 2 0.45% 1 - 2 Human betaherpesvirus 6B (taxid 32604) 2 0.45% 1 - 2 Human papillomavirus 9 (taxid 10621) 1 0.22% 1 Human papillomavirus 4 (taxid 10617) 1 0.22% 1 Human alphaherpesvirus 3 (taxid 10335) = Varicella Zoster 1 0.22% 64 Human alphaherpesvirus 1 (taxid 10298) = Herpes Simplex 1 0.22% 1

8 viruses

Testing of HIV+ samples

Control (HIV - ) 22014 unclassified (taxid 0) 2314 Homo sapiens (taxid 9606)

HIV + 32650 unclassified (taxid 0) 2364 Homo sapiens (taxid 9606) 1 Human betaherpesvirus (taxid 10372)

HIV + 37007 unclassified (taxid 0) “Infection with parvovirus B19 during 1957 Homo sapiens (taxid 9606) pregnancy can cause several serious 11 other sequences (taxid 28384) complications in the fetus, such as fetal 6 Human parvovirus B19 (taxid 10798) anemia, neurological anomalies, hydrops 2 Erythroparvovirus (taxid 40121) fetalis, and fetal death. “ - Parvovirus B19 1 Enterobacteria phage YYZ-2008 infection in pregnancy. (taxid 564886) 9 chrM

Task: evaluate mtDNA and nuclear DNA ratio

Snakemake pipeline:

extract MT reads MT reads count raw mtDNA/nDNA BAMs nuclear ratio extract reads R studio nuclear reads

10 chrM N = 439

11 chrM Streck vs EDTA

12 SNPs

nuclear LOW DEPTH SNPs SNP calling (total bam files) HaplotypeCaller extract mtDNA GATK chrM SNPs Mutect2 GATK 13 Clinical disorders that are caused by mutations in mitochondrial DNA

Mitochondrial DNA Disorder Clinical phenotype mtDNA genotype Gene Status Inheritance

Kearns–Sayre syndrome Progressive myopathy, ophthalmoplegia, A single, large-scale Several deleted genes Heteroplasmic Usually sporadic cardiomyopathy deletion

Pearson syndrome Pancytopoenia, lactic acidosis A single, large-scale Several deleted genes Heteroplasmic Usually sporadic deletion

MERRF Myoclonic epilepsy, myopathy 8344A>G; 8356T>C TRNK Heteroplasmic Maternal

NARP Neuropathy, ataxia, retinitis pigmentosa 8993T>G ATP6 Heteroplasmic Maternal

3460G>A ND1 Hetero- or homoplasmic Maternal

LHON Optic neuropathy 11778G>A ND4 Hetero- or homoplasmic Maternal

14484T>C ND6 Hetero- or homoplasmic Maternal

Myopathy and diabetes Myopathy, weakness, diabetes 14709T>C TRNE Hetero- or homoplasmic Maternal

Sensorineural hearing loss Deafness 1555A>G RNR1 Homoplasmic Maternal

Individual mutations TRNS1 Hetero- or homoplasmic Maternal

Exercise intolerance Fatigue, muscle weakness Individual mutations CYB Heteroplasmic Sporadic

Fatal, infantile encephalopathy; Encephalopathy, lactic acidosis 10158T>C; 10191T>C ND3 Heteroplasmic Sporadic Leigh/Leigh-like syndrome

14 Robert W. Taylor and Doug M. Turnbull, 2007 mtDNA SNPs

Total: 499 SNPs

Filter DP ≥ 10 29 SNPs

15 SNPs

CLNDN Number of POS SNP CLNSIG rsID (CLINVAR) patients

15043 G > A 193302985 3

193302991 15301 G > A 6 Familial cancer of Likely pathogenic breast 15326 A > G 2853508 5

527236185 15458 T > C 1

Resistance to Parkinson 10398 A > G Protective 2853826 1 disease *

Conflicting Leber's optic 28359178 13708 G > A interpretations 1 atrophy 16 Conclusion

Possible applications:

- contamination control

- preliminary viral diagnostics

- preliminary mitochondrial SNP detection

17