Making Sense out of Y Chromosome Polymorphisms
Or why males are so complicated...
Thomas Krahn Human Y Chromosome Basics
➔Only in males (exceptions) ➔Inherited in strict paternal line ➔About 58 million bases long ➔Only ~27 Mbases sequenced ➔Highly repetitive ➔Contains pseudo autosomal regions ➔Largest palindromes in human genome
Large Scale ChrY Changes
➔Insertions / Deletions ➔Whole chromosome duplications ➔Ring Y chromosomes ➔Inversions ➔Translocations / Fusion chromosomes
➔ Peter H.Vogt: AZF deletions and Y chromosomal haplogroups Hum. ➔ Reprod. 11 (4): 319-336. doi: Microscopic karyotype 10.1093/humupd/dmi017 ➔FISH with target specific fluorescent probes ➔Male infertility ➔Gender determination Turner Syndrome 45,XO:46,XY:46XX = 50:30:20 (Sports / Olympics)
Premi S, Srivastava J, Panneer G, Ali S, 2008 Startling Mosaicism of the Y-Chromosome and Tandem Duplication of the SRY and DAZ Genes in Patients with Turner Syndrome. PLoS ONE 3(11): e3796. doi:10.1371/journal.pone.0003796 Y Chromosome Repeats
Y-STR (DYS19, DYS385, DYF399) Mini satellites (MSY) Inverted repeats Palindromes Multi palindromes Parallel repeats (TSPY)
Y chromosomal variation tracks the evolution of mating systems in chimpanzee and bonobo. Schaller F, Fernandes AM, Hodler C, Münch C, Pasantes JJ, Rietschel W,
Schempp W (2010) PLoS ONE 5(9): Peter H.Vogt: AZF deletions and Y chromosomal haplogroups e12482. doi:10.1371/journal.pone.0012482 Hum. Reprod. 11 (4): 319-336. doi: 10.1093/humupd/dmi017 Y-STR
➔Classical paternal line sibling test ➔Interesting for genealogists (surname correlation) ➔Isolation of a male profile from a mixed trace ➔No contamination problems with female lab personal ➔Ready made multiplex kits available (Powerplex Y, Yfiler, Argus Y) ➔Number of markers not sufficient for genealogists because they demand higher resolution Adding More Markers to PPY
DYS426 and DYS388 are usually slow mutators, but in some haplogroups they suddenly increase mutation frequency. They have been in the FTDNA database right from the start but they are absent in the PPY kit.
PPY has some gaps in JOE and TMR. Just enough to fill them with DYS426 and DYS388
More single copy Y-STRs
➔Quick & easy to score ➔Not severely influenced by recombination ➔Easy and understandable comparisons for genealogists ➔Plenty of Y-STRs published ➔Many of them have standardized nomenclature (NIST) ➔FTDNA was always market leader with number of Y-STR (12, 25, 37, 67 and 111 marker panel plus specialty Y-STR) ➔My goal was always to have ALL markers that the competitors had so that FTDNA customers could compare with all databases.
Why So Many Y-STR?
➔Huge surname projects with 800+ family members ➔Find splits in closely related Y lines ➔Predict haplogroups from Y-STR haplotypes ➔Consistency checks across panels ➔Precisely map Y chromosome deletions
Special STRs: Multi Copy Y-STR
DYS725: Difficult to interpret dinucleotide repeat but just a few 100 bp next to DYS464 Good to verify unusual DYS464 results
DYF408: 188 bp segment doesn't actually contain STR repeat units. Good to calibrate molar equivalents
DYF397: Asymmetric P1/P3 palindromic Y-STR 2 copies on P1 and 2 copies on P3 Good to distinguish different deletions / duplications DYS385, DYS464, DYF399, DYS425, DYF408 DYS385 Kittler Protocol
Kittler R, Erler A, Brauer S, Stoneking M, Kayser M (2003) Apparent intrachromosomal exchange on the human Y chromosome explained by population history. Eur. J. Hum. Genet. 11(4): 304-14. Using Adjacent SNPs to Separate Loci of Multicopy Y-STRs
Fluorescein
JOE
TAMRA
DYS464 Extended Test (DYS464X) Using Adjacent SNPs to Separate Loci of Multicopy Y-STRs
Y h g r o u p D Y S 4 6 4 A 1 1 g - 1 3 g - 1 3 g - 1 6 g E 1 4 g - 1 5 . 3 g - 1 7 g - 1 8 g E 3 b 1 1 4 g - 1 5 . 3 g - 1 7 g - 1 8 g G 1 3 g - 1 4 g - 1 5 g - 1 5 g Typing of G 2 * 1 2 g - 1 2 g - 1 2 g - 1 3 g I 1 2 g - 1 4 g - 1 5 g - 1 6 g I 1 a 1 2 g - 1 4 g - 1 4 g - 1 6 g DYS464X I 1 a 3 1 2 g - 1 2 g - 1 4 g - 1 4 g - 1 5 g - 1 6 g Other haplogroups have I 1 b 1 1 g - 1 4 g - 1 4 g - 1 4 g only G-type alleles I 1 b 1 1 g - 1 4 g - 1 4 g - 1 5 g I 1 b 2 a 1 1 g - 1 4 g - 1 4 g - 1 5 g I 1 b 2 a 1 1 1 g - 1 1 g - 1 4 g - 1 5 g I 1 c 1 4 g - 1 5 g - 1 5 g - 1 6 g J 2 a 1 * 1 2 g - 1 3 g - 1 5 g - 1 6 g - 1 6 g - 1 6 g N 1 4 g - 1 4 . 3 g R 1 a 1 * 1 2 g - 1 5 g - 1 5 g - 1 6 g R 1 b 1 6 c - 1 6 c - 1 6 g - 1 6 g R 1 b 1 5 c - 1 6 c R 1 b 1 5 c - 1 5 c - 1 7 c - 1 7 g R 1 b 1 4 c - 1 6 c - 1 7 c - 1 7 g R 1 b 1 4 c - 1 5 c - 1 6 g - 1 7 c R 1 b 1 6 c - 1 6 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 6 g R 1 b 1 4 c - 1 5 c - 1 7 c - 1 7 g R1b has usually 3 C-type R 1 b 1 5 c - 1 6 c - 1 7 g - 1 7 g R 1 b 1 5 c - 1 6 c alleles and one G-type allele R 1 b 1 5 c - 1 5 c - 1 5 c - 1 5 c R 1 b 1 5 c - 1 5 c - 1 7 c - 1 7 g R 1 b 1 5 c - 1 7 c - 1 7 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 7 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 6 g - 1 7 c R 1 b 1 6 c - 1 6 c - 1 7 c - 1 7 g R 1 b 1 4 c - 1 5 c - 1 5 c - 1 5 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 7 . 1 g R 1 b 1 5 c - 1 5 c - 1 6 g - 1 7 c R 1 b 1 3 c - 1 5 c - 1 7 c - 1 7 g R 1 b 1 5 c - 1 5 c - 1 7 g - 1 7 g Exceptions most likely R 1 b 1 5 c - 1 6 c - 1 6 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 7 c products of recLOH Palindromic Map
RecLOH
centromere 9 39 14 DYF371 DYF399 DYS464 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 C-type DYF408 T-type C-type DYS725 188 bp 188 bp P1 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 DYF371 DYF408 DYF399 DYS464 DYS725 C-type C-type C-type telomere 10 36 16
centromere 109 3936 1416 DYF371 DYF399 DYS464 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 C-type DYF408 T-type C-type DYS725 188 bp 188 bp P1 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 DYF371 DYF408 DYF399 DYS464 DYS725 C-type C-type C-type telomere 10 36 16
Recombination driven Loss Of Heterozygosity P1/P2 Deletion Mechanism
Symmetry in the red/red (P1/P2) region allows for another irregular conformation:
DYF397 P3 DYF397
DYF399 Recombination breakpoint ins G DYF399 T-type DYS464 DYS725 DYS725 DYS464 C-type DYF408 188 bp
DYF399 DYS464G DYS725 DYS725 DYS464 T-type p b DYF408
8 N.N. 8 1 Circle conformation
DYF371 DYF397
DYS724 DYF385 DYF387 DYF401
P1/P2 Deletion Mechanism
The circular DNA molecule can't replicate on its own and gets lost in the next cell cycle
DYF397 P3 DYF397 DYF399 ins G T-type DYS464 DYS725 DYS725 DYS464 DYF399 DYF408 188 bp
DYF399 DYS464G DYS725 DYS725 DYS464 T-type p b DYF408 N.N. 8 8 1 Deletion DYF371 DYF397
DYS724 DYF385 DYS459 DYF387 DYF401
Special Y-STRs: DYS389 DYS389 I+II fusion repeat observed
TCTG TCTA TCTG TCTA
SNPs are also affected by ChrY Self- Recombination
L88 region in haplogroup J-L26/L27 SNPs are also affected by ChrY Self- Recombination
L88 region in haplogroup E-M2 SNPs are also affected by ChrY Self- Recombination
ChrX ChrY
L88
L88 region of highly similar ChrX sequence Y-SNPs and Haplogroups
➔Haplogroups are defined by “stable“ Y-SNPs ➔YCC haplogroup tree (most parsimonous tree) ➔Hundreds of refinements and additions ➔The same characteristic mutation often shows up in completely distinct branches of the tree (.2) ➔Parallel and back mutations happen in real life ➔Those can often be explained
by recombination events Keeping Track of New Y-SNPs and Y Tree Changes
➔Ymap Y chromosome browser contains information about most published Y markers ➔Don't add new marker names when they already exist ➔Info about location, base change, primers, hg association and palindrome position ➔Based on gbrowse ➔Instantly synchronized with our LIMS db
Http://ymap.ftdna.com Keeping Track of New Y-SNPs and Y Tree Changes
➔Ytree (Draft Y chromosome tree) ➔Node based structure http://ytree.ftdna.com ➔New SNPs found are instantly added
➔Automatically keeps a traceable change log Walk Through the Y Project
90
80
70
60
50
40
30
20
10
0 A B C D E F G H I J K L M N O P Q R S T
Coverage (currently) ~ 200 kB Sanger sequences On average 1.2 new SNPs per participant found Verification and mapping of new mutations on Ytree
230 WTY participants from mainly European haplogroups Designing PCR Primers for ChrY
Input Segments target location
fastacmd +/- 500 bp preset P3 params.
BLAST vs. all human chromosomes
Pick best 1000 hits (exclude identity)
fastacmd and create mispr. lib
Primer3 Manually change parameters
Good primers found? no, maybe yes
Output prim. with M13 ChrY Self Homology
ChrY Similarity to Other Chromosomes
Design of a ChrY Library Enrichment Micro-Array
NimbleGen Titanium Sequence Capture 385K Array 454 Sequencing of Enriched Y Libs
Nextera Lib >> Nimblegen Y enrichment >> 454
From 563 to 125602 unique ChrY matching reads per 1/8 region Next Gen ChrY Sequencing as a Commercial Product?
➔To enrich or not enrich? (low cost vs. information gain) ➔Problem with short reads (assembly, wrong mapping) ➔Verification of new mutations by Sanger sequencing
➔What can a genealogist learn from a huge package of sequencing data? How can we design comparison databases with only partially overlapping datasets (sequencing gaps)?
➔Bringing all information from STR, SNP, WTY, 454, Axiom, Infinium etc. together