Candidate Polymorphisms (and emerging genetic features) Associated With Severe Male Factor Infertility

Kenneth I. Aston PhD University of Utah, USA Outline

• Background- genetics of complex disease • Review of male infertility SNP literature • Our GWAS and follow-up studies • Emerging data – CNVs/ROHs – Genomic instability? • Future directions • Conclusions Background

• Point Mutation – Single base change that occurs rarely in the population – Can be synonymous, missense, or nonsense

• Single Nucleotide Polymorphism (SNP) – Single base change that occurs in a significant proportion of the population (>1%; evolutionarily successful mutation) – Can be synonymous, missense, or nonsense

• Copy Number Variatiant (CNV) – Segment of DNA >1kb present at a variable copy number compared with a reference genome (CNVs encompass >10% of the genome) – Can be benign, or can affect dosage or function Disease models

• Common Disease, Common Variant – SNPs, common CNVs with low penetrance

• Common Disease, Rare Variant – Point mutations, rare CNVs with low to high penetrance Identifying the genetic basis for common diseases

• Cytogenetic approaches (mid-1900s)

• Linkage analysis and positional cloning (1980s)

• Gene re-sequencing studies (1990s)

• Microarray-based genome-wide association studies (2000s)

• Exome and whole genome sequencing (2010s) Gene Re-sequencing Development/Differentiati on 2006/2010 SRY INSL3 Sperm Function SOX9 LGR8 PRM1 CSNK2A2 CFTR cKIT PRM2 APOB BMP4 BMP8 TNP1 ADAM2,3 Endocrinopathies SPERM1 ROS GNRH KAL AHR ARNT AHRR ART3 AR FSH LH SRD5 Spermatogenesis CREM ACT (FHL5) LHR FSHR DAZL ODC LIPE SBF1 UBE2B OAZ3 ESR2 RBM BAX ACE GOPC LIMK2 LMTK2 YBX2 NANOS3 PARP2 PRDM9 Meiosis CSNK2A2 DDX25 SPO11 DMC1 H1FNT TAF7L SCP3 MSH4 TBPL1 TNP2 MLH1 SYCP3 FASLG FAS BRCA2 ERCC1 MTRR YSSK4 FKBP6 LIMK2 IL1B MTHFR MEI1 MLH3 TSSK4 UTP14 MSH5 REC8 TNF BCL2L2 UBE2B FUS HMGB2 PPP1CC STYX Simoni meta-analysis

Simoni, RBM Online 15(6), 643-658 (2007) Pitfalls of gene re-sequencing studies • Poor study design • Small sample size • Mixed ethnicity • Poorly phenotyped samples • Lack of replication • Focused- can miss regions not known to be important for spermatogenesis Genome-Wide Association Studies (GWAS) – Performed for many common, complex diseases

– Adequately powered studies generally include 100s to 1000s of cases and controls

– Hundreds of risk SNPs identified (most with odds ratios of 1.1-1.5)

– Many of the SNPs reported in these studies are located in gene deserts or regions of unknown function.

Manolio et al. Nature 461(8) 2009 Pilot GWAS to identify novel variants associated with male infertility

• Normozoospermic controls (n = 80)

• Severe oligozoospermics (< 5 x 106/ml; n = 52)

• Non-obstructive azoospermics (n = 40)

• HapMap reference samples in triplicate (QC; n=2) Genomic Microarray Analysis

• DNA purified from blood.

• Analyzed on Illumina 370K chips. – 318,000 SNP probes – 52,000 intensity only probes Results- Associated SNPs

• 16 SNPs associated with azoospermia

• 4 SNPs associated with oligozoospermia Follow-up SNP study

• Attempt to validate SNP associations in pilot microarray study • Genotyped 377 additional samples across selected SNPs – 158 controls – 139 severe oligozoospermics – 80 azoospermics • University of Utah, University of Florence, Fundacio Puigvert Follow-up SNP study • Illumina BeadXpress assay for genotyping

– 84 GWAS SNPs

– 21 SNPs from previously published re-sequencing studies reporting significant association with male infertility

– 67 non-synonymous SNPs in important in spermatogenesis Results • Improved associations for four microarray significant SNPs (4/70)

• Significant associations for 1/21 published SNPs (FASLG; P < 0.05)

• 4/67 spermatogenesis gene SNPs (TEX15, JMJD1A, BRDT; P < 0.05) Conclusions regarding common variants and male infertility • Few common variants identified that confer risk (e.g. gr/gr del, MTHFR SNPs, few possible GWAS SNPs)

• The effect size of SNPs reported in the literature is generally quite small (O.R. < 1.5)

• Analysis of several thousand cases and controls necessary to gauge the true involvement of SNPs and their impact on spermatogenesis What next?

• Other disease models suggest rare variants a more promising avenue for future study – Point mutations – Copy number variation (CNV) “Rare” variants associated with spermatogenic defects

• DNAL1, DNAH5, DNAI1 mutations for primary ciliary dyskinesia (Mazor et al. Am J Hum Genet 2011)

• DPY19L homozygous deletion for total globozoospermia (Harbuz et al. and Koscinski et al. Am J Hum Genet 2011)

• NR5A1 mutation for spermatogenic failure (Bashamboo et al. Am J Hum Genet 2010) CNVs and Male Infertility

• 11q24.2q25 - Single case study - Duplication - Hypogonadism - Azoospermia Gohring et al., 2008

• SPANX Hansen et al., 2009

Center for Human Genetics and Laboratory Medicine • TSPY Variants (Krausz Lab)

• SHOX gene CNVs (Jorgez et al. 2011)

• Our pilot GWAS University of Utah GWAS

• Illumina 370k array

• 146 samples after QC – Azoospermia (n=35) – Severe oligozoospermia (n=49) – Controls (n = 62)

• Several analysis strategies – Single locus tests for association – CNV “burden” analysis – ROH/Consanguinity Single Locus Test – 100kb deletions

Chromosome GENE P-VALUE FDR

X COX7B 0.02 0.13

X ATP7A 0.02 0.13

X PGAM4 0.02 0.13

Y LOC728137 0.02 0.12

Y TSPY1 0.02 0.12

Y LOC728137 0.02 0.12

Y LOC728137 0.02 0.12

X KIAA1166 0.05 0.21

X SPIN4 0.08 0.28

X ARHGEF9 0.08 0.28

X FAM123B 0.08 0.28

X ASB12 0.08 0.28

X MTMR8 0.08 0.28

X ZC3H12B 0.08 0.28

X LAS1L 0.08 0.28

X MSN 0.08 0.28

X VSIG4 0.08 0.28 Basis for CNV burden analysis

– Schizophrenia (International Schizophrenia Consortium Nature Sept 2008)

– Autism (Morrow review J Am Acad Adolesc Phych Nov 2010)

– Idiopathic Mental Retardation (2-hit hypothesis; Girirajan et al. Nat Genet Mar 2010) Increased deletion burden in patients

Mean # large Mean # bps deleted/sample deletions/sample 3.5E+06 6

3.0E+06 5

2.5E+06 4 2.0E+06 3 1.5E+06 2 1.0E+06

5.0E+05 1

0.0E+00 0 Azoo Oligo Control Azoo Oligo Control Runs of homozygosity (ROH)

Causes of long stretches of homozygosity : • deletions • recent selective pressure. • inbreeding • population bottlenecks

Disease consequences : • Genome-wide effects of homozygosity identified for heart disease/cancer/blood pressure/LDL cholesterol (McQuillian et al. 2008). • Many pediatric diseases are recessive single gene disorders (Hildebrandt et al 2009). • ROH burden effect has been found in schizophrenic patients (Lencz et al. 2007). Outlier 1: Cytogenetic anomaly(ies)? In an azoospermic patient

Length of ROH all cohorts

0

0

6

0

0

5

0

0

4

y

c

n

0

e

0

u

3

q

e

r

F

0

0

2

0

0

1 0

0 50 100 150 200 250 Mb Outlier 2: Consanguinity? In severely oligozoospermic patient

Length of ROH all cohorts

0

0

6

0

0

5

0

0

4

y

c

n

0

e

0

u

3

q

e

r

F

0

0

2

0

0

1 0

0 50 100 150 200 250 Mb

This sample has 9 ROHs, totaling ~3% of the genome. Largest is 43Mb (pictured).

Homozygosity by decent summary 0

7 azoo norm

ozoo

0 6

Excluding the case of UPD, 0

5 there may be 4 or 5 cases of s

H azoo- or oligozoospermia

O

0

R

4

f that show pathological

o

r homozygosity.

e

0

b

3

m

u

n

0

2

0

1 0 0.000 0.005 0.010 0.015 0.020 0.025 Proportion of genome in ROH Conclusions • Contribution of individual SNPs to male factor infertility is minor • CNVs likely involved in some spermatogenic defects • Spermatogenic defects associated with increased burden of large CNVs and ROHs • Future work should focus on rare variants • Increased funding and large, multi-center studies will be necessary to better identify the genetic basis for male infertility Acknowledgements

University of Utah University of Florence Doug Carrell Csilla Krausz Ben Emery Ilaria Laface Tim Jenkins Claudia Giachini Sue Hammoud Lihua Liu Fundacio Puigvert (Spain) Jeanine Griffin Eduardo Ruiz-Castane Luke Simon

Washington University Don Conrad

SHOX gene CNVs

• Microarray analysis of X and Y in men with Y microdeletions identified prevalence of SHOX duplications and deletions

Jorgez et al. JCEM 2011 Y Chromosome Scale 50 Mb chrX: 50000000 100000000 150000000 Deletions, cases (PLINK CNV track)

Scale 200 kb chrX: 250000 300000 350000 400000 450000 500000 550000 600000 650000 700000 750000 800000 Deletions, cases (PLINK CNV track) 1_4225320273_B 1_4225320318_A Deletions, controls (PLINK CNV track) 1_4225320286_B Duplications, cases (PLINK CNV track) 1_4225320261_B 1_4225320875_A 1_4225320819_A Duplications, controls (PLINK CNV track) 1_4220473380_A 1_4225320277_B 1_4225320232_A 1_4225320277_A 1_4225320258_A 1_4225320288_B Chromosome Bands Localized by FISH Mapping Clones Xp22.33 UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics GTPBP6 SHOX DQ576041 GTPBP6 SHOX PPP2R3B PPP2R3B RefSeq Genes RefSeq Genes Deletions, controlsHuman (PLINK mRNAs from CNV GenBank track) Human mRNAs 5 Mb Human ESTs That Have BeenScale Spliced Spliced ESTs chrX: 60000000 65000000 50 _ ENCODE Enhancer- and Promoter-Associated Histone Mark (H3K4Me1) on 8 Cell Lines Deletions, cases (PLINK CNV track) Enhanced H3K4Me1 0 _ 100 _ ENCODE Promoter-Associated Histone Mark (H3K4Me3) on 9 Cell Lines Promoter H3K4Me3 0 _ ENCODE Digital DNaseI Hypersensitivity Clusters DNase Clusters ENCODE Transcription Factor ChIP-seq Deletions, controls (PLINK CNV track) Txn Factor ChIP 3 _ Placental Mammal Basewise Conservation by PhyloP

Mammal Cons Duplications, cases (PLINK CNV track) -0.5 _ Duplications, controls (PLINK CNV track) Multiz Alignments of 44 Vertebrates Chromosome Bands Localized by FISH Mapping Clones Rhesus Xp11.21 Xq11.1 Xq12 Mouse Xp11.1 Xq13.1 Dog Gap Locations Elephant centromere no Opossum RefSeq Genes Platypus PFKFB1 KLF8 FAAH2 SPIN4 ZC4H2 MSN AR EFNB1 EDA Chicken APEX2 KLF8 ZXDB LOC92249 ZC4H2 VSIG4 AR PJA1 IGBP1 Lizard ALAS2 UBQLN2 ZXDA ARHGEF9 ZC4H2 VSIG4 OPHN1 EDA X_tropicalis ALAS2 LOC550643 ARHGEF9 ZC3H12B EDA2R YIPF6 EDA Stickleback Duplications, cases (PLINK CNVALAS2 track)LOC442454 ARHGEF9 LAS1L EDA2R YIPF6 EDA Simple Nucleotide PolymorphismsPAGE2B (dbSNP build SPIN3130) MIR1468 LAS1L EDA2R STARD8 EDA SNPs (130) PAGE2 SPIN3 FAM123B LAS1L EDA2R STARD8 MIR676 Duplications of >1000 Bases of Non-RepeatMaskedFAM104B SequenceSPIN2B ASB12 FKSG43 STARD8 AWAT2 chrX_random:285818 FAM104B SPIN2B chr15:18571777 MTMR8 MIR223 PJA1 AWAT1 chrX_random:275386 Duplications, controls (PLINKFAM104B CNV track)SPIN2B chr15:19583020 VSIG4 PJA1 P2RY4 chr15:20008632 FAM104B SPIN2A chr2:87463715 VSIG4 FAM155B DLG3 FAM104B chr2:91071502 HEPH OTUD6A FAM104B chr22:20934193 HEPH DGAT2L6 FAM104B chr2:91057403 HEPH ARR3 FAM104B chr21:10138514 RAB41 Deletions from GenotypeMTRNR2L10 Analysis (Conrad) PDZD11 Deletions from Haploid HybridizationPAGE5 Analysis (Hinds) KIF4A Chromosome BandsCopy NumberLocalized Polymorphisms by from FISH BACPAGE5 Microarray Mapping Analysis (Iafrate) Clones GDPD2 Copy Number Polymorphisms from BACPAGE3 Microarray Analysis (Locke) GDPD2 11.1 Deletions from Genotype Analysis (McCarroll) q22.3 q24 Xq28 Copy Number Polymorphisms from SNPMAGEH1 and BAC microarrays (Redon) GDPD2 p22.2 21.1 cnp1391 Xq21.1 USP51 Xq23 Xq25 GDPD2 Copy Number Polymorphisms FOXR2from ROMA (Sebat) DLG3 Copy Number Polymorphisms from BAC MicroarrayRRAGB Analysis (Sharp) DLG3 12 Structural Variation identified by FosmidsRRAGB (Tuzun) TEX11 Repeating Elements by RepeatMasker TEX11 RepeatMasker Repeating Elements by RepeatMasker RefSeq GenesRepeatMasker zoom in to <= 10,000,000 bases to view items RefSeq Genes Repeating Elements by RepeatMasker RepeatMasker zoom in to <= 10,000,000 bases to view items