Making Sense out of Polymorphisms

Or why males are so complicated...

Thomas Krahn Human Y Chromosome Basics

➔Only in males (exceptions) ➔Inherited in strict paternal line ➔About 58 million bases long ➔Only ~27 Mbases sequenced ➔Highly repetitive ➔Contains pseudo autosomal regions ➔Largest palindromes in human genome

Large Scale ChrY Changes

➔Insertions / Deletions ➔Whole chromosome duplications ➔Ring Y chromosomes ➔Inversions ➔Translocations / Fusion chromosomes

➔ Peter H.Vogt: AZF deletions and Y chromosomal haplogroups Hum. ➔ Reprod. 11 (4): 319-336. doi: Microscopic karyotype 10.1093/humupd/dmi017 ➔FISH with target specific fluorescent probes ➔Male infertility ➔Gender determination Turner Syndrome 45,XO:46,XY:46XX = 50:30:20 (Sports / Olympics)

Premi S, Srivastava J, Panneer G, Ali S, 2008 Startling Mosaicism of the Y-Chromosome and Tandem Duplication of the SRY and DAZ Genes in Patients with Turner Syndrome. PLoS ONE 3(11): e3796. doi:10.1371/journal.pone.0003796 Y Chromosome Repeats

Y-STR (DYS19, DYS385, DYF399) Mini satellites (MSY) Inverted repeats Palindromes Multi palindromes Parallel repeats (TSPY)

Y chromosomal variation tracks the evolution of mating systems in chimpanzee and bonobo. Schaller F, Fernandes AM, Hodler C, Münch C, Pasantes JJ, Rietschel W,

Schempp W (2010) PLoS ONE 5(9): Peter H.Vogt: AZF deletions and Y chromosomal haplogroups e12482. doi:10.1371/journal.pone.0012482 Hum. Reprod. 11 (4): 319-336. doi: 10.1093/humupd/dmi017 Y-STR

➔Classical paternal line sibling test ➔Interesting for genealogists (surname correlation) ➔Isolation of a male profile from a mixed trace ➔No contamination problems with female lab personal ➔Ready made multiplex kits available (Powerplex Y, Yfiler, Argus Y) ➔Number of markers not sufficient for genealogists because they demand higher resolution Adding More Markers to PPY

DYS426 and DYS388 are usually slow mutators, but in some haplogroups they suddenly increase frequency. They have been in the FTDNA database right from the start but they are absent in the PPY kit.

PPY has some gaps in JOE and TMR. Just enough to fill them with DYS426 and DYS388

More single copy Y-STRs

➔Quick & easy to score ➔Not severely influenced by recombination ➔Easy and understandable comparisons for genealogists ➔Plenty of Y-STRs published ➔Many of them have standardized nomenclature (NIST) ➔FTDNA was always market leader with number of Y-STR (12, 25, 37, 67 and 111 marker panel plus specialty Y-STR) ➔My goal was always to have ALL markers that the competitors had so that FTDNA customers could compare with all databases.

Why So Many Y-STR?

➔Huge surname projects with 800+ family members ➔Find splits in closely related Y lines ➔Predict haplogroups from Y-STR haplotypes ➔Consistency checks across panels ➔Precisely map Y chromosome deletions

Special STRs: Multi Copy Y-STR

DYS725: Difficult to interpret dinucleotide repeat but just a few 100 bp next to DYS464 Good to verify unusual DYS464 results

DYF408: 188 bp segment doesn't actually contain STR repeat units. Good to calibrate molar equivalents

DYF397: Asymmetric P1/P3 palindromic Y-STR 2 copies on P1 and 2 copies on P3 Good to distinguish different deletions / duplications DYS385, DYS464, DYF399, DYS425, DYF408 DYS385 Kittler Protocol

Kittler R, Erler A, Brauer S, Stoneking M, Kayser M (2003) Apparent intrachromosomal exchange on the human Y chromosome explained by population history. Eur. J. Hum. Genet. 11(4): 304-14. Using Adjacent SNPs to Separate Loci of Multicopy Y-STRs

Fluorescein

JOE

TAMRA

DYS464 Extended Test (DYS464X) Using Adjacent SNPs to Separate Loci of Multicopy Y-STRs

Y h g r o u p D Y S 4 6 4 A 1 1 g - 1 3 g - 1 3 g - 1 6 g E 1 4 g - 1 5 . 3 g - 1 7 g - 1 8 g E 3 b 1 1 4 g - 1 5 . 3 g - 1 7 g - 1 8 g G 1 3 g - 1 4 g - 1 5 g - 1 5 g Typing of G 2 * 1 2 g - 1 2 g - 1 2 g - 1 3 g I 1 2 g - 1 4 g - 1 5 g - 1 6 g I 1 a 1 2 g - 1 4 g - 1 4 g - 1 6 g DYS464X I 1 a 3 1 2 g - 1 2 g - 1 4 g - 1 4 g - 1 5 g - 1 6 g Other haplogroups have I 1 b 1 1 g - 1 4 g - 1 4 g - 1 4 g only G-type alleles I 1 b 1 1 g - 1 4 g - 1 4 g - 1 5 g I 1 b 2 a 1 1 g - 1 4 g - 1 4 g - 1 5 g I 1 b 2 a 1 1 1 g - 1 1 g - 1 4 g - 1 5 g I 1 c 1 4 g - 1 5 g - 1 5 g - 1 6 g J 2 a 1 * 1 2 g - 1 3 g - 1 5 g - 1 6 g - 1 6 g - 1 6 g N 1 4 g - 1 4 . 3 g R 1 a 1 * 1 2 g - 1 5 g - 1 5 g - 1 6 g R 1 b 1 6 c - 1 6 c - 1 6 g - 1 6 g R 1 b 1 5 c - 1 6 c R 1 b 1 5 c - 1 5 c - 1 7 c - 1 7 g R 1 b 1 4 c - 1 6 c - 1 7 c - 1 7 g R 1 b 1 4 c - 1 5 c - 1 6 g - 1 7 c R 1 b 1 6 c - 1 6 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 6 g R 1 b 1 4 c - 1 5 c - 1 7 c - 1 7 g R1b has usually 3 C-type R 1 b 1 5 c - 1 6 c - 1 7 g - 1 7 g R 1 b 1 5 c - 1 6 c alleles and one G-type allele R 1 b 1 5 c - 1 5 c - 1 5 c - 1 5 c R 1 b 1 5 c - 1 5 c - 1 7 c - 1 7 g R 1 b 1 5 c - 1 7 c - 1 7 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 7 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 6 g - 1 7 c R 1 b 1 6 c - 1 6 c - 1 7 c - 1 7 g R 1 b 1 4 c - 1 5 c - 1 5 c - 1 5 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 7 . 1 g R 1 b 1 5 c - 1 5 c - 1 6 g - 1 7 c R 1 b 1 3 c - 1 5 c - 1 7 c - 1 7 g R 1 b 1 5 c - 1 5 c - 1 7 g - 1 7 g Exceptions most likely R 1 b 1 5 c - 1 6 c - 1 6 c - 1 8 g R 1 b 1 5 c - 1 5 c - 1 6 c - 1 7 c products of recLOH Palindromic Map

RecLOH

centromere 9 39 14 DYF371 DYF399 DYS464 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 C-type DYF408 T-type C-type DYS725 188 bp 188 bp P1 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 DYF371 DYF408 DYF399 DYS464 DYS725 C-type C-type C-type telomere 10 36 16

centromere 109 3936 1416 DYF371 DYF399 DYS464 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 C-type DYF408 T-type C-type DYS725 188 bp 188 bp P1 N.N. DYF397 DYF401 DYF387 DYS459 DYF385 DYS724 DYF371 DYF408 DYF399 DYS464 DYS725 C-type C-type C-type telomere 10 36 16

Recombination driven Loss Of Heterozygosity P1/P2 Mechanism

Symmetry in the red/red (P1/P2) region allows for another irregular conformation:

DYF397 P3 DYF397

DYF399 Recombination breakpoint ins G DYF399 T-type DYS464 DYS725 DYS725 DYS464 C-type DYF408 188 bp

DYF399 DYS464G DYS725 DYS725 DYS464 T-type p b DYF408

8 N.N. 8 1 Circle conformation

DYF371 DYF397

DYS724 DYF385 DYF387 DYF401

P1/P2 Deletion Mechanism

The circular DNA molecule can't replicate on its own and gets lost in the next cell cycle

DYF397 P3 DYF397 DYF399 ins G T-type DYS464 DYS725 DYS725 DYS464 DYF399 DYF408 188 bp

DYF399 DYS464G DYS725 DYS725 DYS464 T-type p b DYF408 N.N. 8 8 1 Deletion DYF371 DYF397

DYS724 DYF385 DYS459 DYF387 DYF401

Special Y-STRs: DYS389 DYS389 I+II fusion repeat observed

TCTG TCTA TCTG TCTA

SNPs are also affected by ChrY Self- Recombination

L88 region in haplogroup J-L26/L27 SNPs are also affected by ChrY Self- Recombination

L88 region in haplogroup E-M2 SNPs are also affected by ChrY Self- Recombination

ChrX ChrY

L88

L88 region of highly similar ChrX sequence Y-SNPs and Haplogroups

➔Haplogroups are defined by “stable“ Y-SNPs ➔YCC haplogroup tree (most parsimonous tree) ➔Hundreds of refinements and additions ➔The same characteristic mutation often shows up in completely distinct branches of the tree (.2) ➔Parallel and back happen in real life ➔Those can often be explained

by recombination events Keeping Track of New Y-SNPs and Y Tree Changes

➔Ymap Y chromosome browser contains information about most published Y markers ➔Don't add new marker names when they already exist ➔Info about location, base change, primers, hg association and palindrome position ➔Based on gbrowse ➔Instantly synchronized with our LIMS db

Http://ymap.ftdna.com Keeping Track of New Y-SNPs and Y Tree Changes

➔Ytree (Draft Y chromosome tree) ➔Node based structure http://ytree.ftdna.com ➔New SNPs found are instantly added

➔Automatically keeps a traceable change log Walk Through the Y Project

90

80

70

60

50

40

30

20

10

0 A B C D E F G H I J K L M N O P Q R S T

Coverage (currently) ~ 200 kB Sanger sequences On average 1.2 new SNPs per participant found Verification and mapping of new mutations on Ytree

230 WTY participants from mainly European haplogroups Designing PCR Primers for ChrY

Input Segments target location

fastacmd +/- 500 bp preset P3 params.

BLAST vs. all human chromosomes

Pick best 1000 hits (exclude identity)

fastacmd and create mispr. lib

Primer3 Manually change parameters

Good primers found? no, maybe yes

Output prim. with M13 ChrY Self Homology

ChrY Similarity to Other Chromosomes

Design of a ChrY Library Enrichment Micro-Array

NimbleGen Titanium Sequence Capture 385K Array 454 Sequencing of Enriched Y Libs

Nextera Lib >> Nimblegen Y enrichment >> 454

From 563 to 125602 unique ChrY matching reads per 1/8 region Next Gen ChrY Sequencing as a Commercial Product?

➔To enrich or not enrich? (low cost vs. information gain) ➔Problem with short reads (assembly, wrong mapping) ➔Verification of new mutations by Sanger sequencing

➔What can a genealogist learn from a huge package of sequencing data? How can we design comparison databases with only partially overlapping datasets (sequencing gaps)?

➔Bringing all information from STR, SNP, WTY, 454, Axiom, Infinium etc. together