Summer internship, 2019
Searching of potential markers of pregnancy complications using sequencing data of circulating cfDNA from maternal plasma
Supervisors: Intern: A.S.Glotov Alisa Morshneva P.Y.Kozyulina cfDNA
Circulating free DNA (cfDNA) are degraded DNA fragments released to the blood plasma.
Blood cells apoptosis
Fat tissue Tumors necrosis secretion Solid organs
cell-free DNA ( ~150 bp ) Non-invasive prenatal test (NIPT)
Blood Fetal sampling DNA
sequencing of cfDNA from plasma Maternal cfDNA detection of aneuploidies
3 Image: www.niftytest.com Dataset
number of samples: > 450
age: 22 - 47 Pathology Samples
gestation age (weeks): 10 - 21 HIV+ 2
mioma 1 platform: Ion Torrent T13 3 format: bam T18 6 reference: hg19 T21 20 average coverage: 20% X0 1 depth: 1 - 2 4 Focus areas
check the samples for the presence of viral viruses sequences
SNPs Preliminary screening (CLINVAR) cfDNA
maternal + fetal mtDNA ChrM/NuclearDNA ratio
cancer fetal DNA is indistinguishable from tumor DNA
5 viruses
Task: check cfDNA data for viral reads
Snakemake pipeline:
raw extract unmapped parse report with viral KRAKEN2 BAMs unmapped reads (fastq) representation
Samtools, bedtools
6 Top-15 the most represented viruses
Patients N = 446 252 Tequatrovirus (taxid 10663) 223 Tevenvirinae (taxid 1198136) 179 Caudovirales (taxid 28883) 116 Escherichia phage TL-2011b (taxid 1124654) 80 Pahexavirus (taxid 1982251) 65 Enterobacteria phage YYZ-2008 (taxid 564886) 56 Stx2-converting phage 1717 (taxid 563769) can be used for 33 Enterobacteria phage VT2phi_272 (taxid 936054) contamination 27 Enterobacteria phage SfI (taxid 1225789) tracking 25 Human betaherpesvirus 7 (taxid 10372) 23 Salmonella phage RE-2010 (taxid 929814) 15 Skunavirus (taxid 1623305) 14 Enterobacteria phage mEp460 (taxid 1147152) 13 Thermus virus IN93 (taxid 1714273) 12 Escherichia virus T4 (taxid 10665) 7
Human viruses
Taxon Patients Percentage Viral reads
Human betaherpesvirus 7 (taxid 10372) 25 5.61% 1 - 3 Human betaherpesvirus 5 (taxid 10359) = CMV 9 2.02% 1 - 3 Human gammaherpesvirus 4 (taxid 10376) 6 1.35% 1 - 2 Erythroparvovirus (taxid 40121) 5 1.12% 1 - 2 Human parvovirus B19 (taxid 10798) 3 0.67% 1 - 6 Human betaherpesvirus 6A (taxid 32603) 3 0.67% 1 - 6 Human gammaherpesvirus 8 (taxid 37296) 2 0.45% 1 - 2 Human betaherpesvirus 6B (taxid 32604) 2 0.45% 1 - 2 Human papillomavirus 9 (taxid 10621) 1 0.22% 1 Human papillomavirus 4 (taxid 10617) 1 0.22% 1 Human alphaherpesvirus 3 (taxid 10335) = Varicella Zoster 1 0.22% 64 Human alphaherpesvirus 1 (taxid 10298) = Herpes Simplex 1 0.22% 1
8 viruses
Testing of HIV+ samples
Control (HIV - ) 22014 unclassified (taxid 0) 2314 Homo sapiens (taxid 9606)
HIV + 32650 unclassified (taxid 0) 2364 Homo sapiens (taxid 9606) 1 Human betaherpesvirus (taxid 10372)
HIV + 37007 unclassified (taxid 0) “Infection with parvovirus B19 during 1957 Homo sapiens (taxid 9606) pregnancy can cause several serious 11 other sequences (taxid 28384) complications in the fetus, such as fetal 6 Human parvovirus B19 (taxid 10798) anemia, neurological anomalies, hydrops 2 Erythroparvovirus (taxid 40121) fetalis, and fetal death. “ - Parvovirus B19 1 Enterobacteria phage YYZ-2008 infection in pregnancy. (taxid 564886) 9 chrM
Task: evaluate mtDNA and nuclear DNA ratio
Snakemake pipeline:
extract MT reads MT reads count raw mtDNA/nDNA BAMs nuclear ratio extract reads R studio nuclear reads
10 chrM N = 439
11 chrM Streck vs EDTA
12 SNPs
nuclear LOW DEPTH SNPs SNP calling (total bam files) HaplotypeCaller extract mtDNA GATK chrM SNPs Mutect2 GATK 13 Clinical disorders that are caused by mutations in mitochondrial DNA
Mitochondrial DNA Disorder Clinical phenotype mtDNA genotype Gene Status Inheritance
Kearns–Sayre syndrome Progressive myopathy, ophthalmoplegia, A single, large-scale Several deleted genes Heteroplasmic Usually sporadic cardiomyopathy deletion
Pearson syndrome Pancytopoenia, lactic acidosis A single, large-scale Several deleted genes Heteroplasmic Usually sporadic deletion
MERRF Myoclonic epilepsy, myopathy 8344A>G; 8356T>C TRNK Heteroplasmic Maternal
NARP Neuropathy, ataxia, retinitis pigmentosa 8993T>G ATP6 Heteroplasmic Maternal
3460G>A ND1 Hetero- or homoplasmic Maternal
LHON Optic neuropathy 11778G>A ND4 Hetero- or homoplasmic Maternal
14484T>C ND6 Hetero- or homoplasmic Maternal
Myopathy and diabetes Myopathy, weakness, diabetes 14709T>C TRNE Hetero- or homoplasmic Maternal
Sensorineural hearing loss Deafness 1555A>G RNR1 Homoplasmic Maternal
Individual mutations TRNS1 Hetero- or homoplasmic Maternal
Exercise intolerance Fatigue, muscle weakness Individual mutations CYB Heteroplasmic Sporadic
Fatal, infantile encephalopathy; Encephalopathy, lactic acidosis 10158T>C; 10191T>C ND3 Heteroplasmic Sporadic Leigh/Leigh-like syndrome
14 Robert W. Taylor and Doug M. Turnbull, 2007 mtDNA SNPs
Total: 499 SNPs
Filter DP ≥ 10 29 SNPs
15 SNPs
CLNDN Number of POS SNP CLNSIG rsID (CLINVAR) patients
15043 G > A 193302985 3
193302991 15301 G > A 6 Familial cancer of Likely pathogenic breast 15326 A > G 2853508 5
527236185 15458 T > C 1
Resistance to Parkinson 10398 A > G Protective 2853826 1 disease *
Conflicting Leber's optic 28359178 13708 G > A interpretations 1 atrophy 16 Conclusion
Possible applications:
- contamination control
- preliminary viral diagnostics
- preliminary mitochondrial SNP detection
17