US 2015O142331A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2015/0142331 A1 Beim et al. (43) Pub. Date: May 21, 2015

(54) METHODS AND DEVICES FOR ASSESSING Publication Classification RISK OF FEMALE INFERTILITY (51) Int. Cl. (71) Applicant: Celmatix, Inc., New York, NY (US) G06F 9/18 (2006.01) G06F 9/00 (2006.01) (72) Inventors: Piraye Yurttas Beim, New York, NY CI2O I/68 (2006.01) (US); Michael Elashoff, Redwood City, (52) U.S. Cl. CA (US) CPC ...... G06F 19/18 (2013.01): CI2O 1/6883 (2013.01); G06F 19/3431 (2013.01); C12O

(22) Filed: Jan. 26, 2015 Related U.S. Application Data (57) ABSTRACT (63) Rstates s f application No. 14/107.800, The invention generally relates to methods and devices for • u. Yu’s assessing risk of female infertility. In certain aspects, meth (60) Provisional application No. 61/932.226, filed on Jan. ods of the invention involve obtaining a sample, conducting 27, 2014, provisional application No. 61/889,738, an assay on at least one infertility-associated biomarker, and filed on Oct. 11, 2013, provisional application No. assessing risk to the patient of developing early-onset 61/737,693, filed on Dec. 14, 2012. decrease in fertility based upon results of the assay.

FertiArray v .0

(xiikelihoxic; 3::ting pregnant xiikeliitto: of inter:::ity

FertiiArray "F8 2. Patent Application Publication May 21, 2015 Sheet 1 of 15 US 201S/O142331 A1

FIGURE 1.

FertiiArrayFM .0

wike.8oad of setting regat: 8:ikelihoot of infertity

FertilArray 2 Patent Application Publication May 21, 2015 Sheet 2 of 15 US 201S/O142331 A1

FIGURE 2 Combining Clinical and Genetic Data Analytics

883 & 8:::::::::3:3: 8::::::::

citiy {{ is 3.x. {{C3s.

:::::::: *::::::::::g. {}, cycles .38:S

kia: Scies::::::::::::P:8:

38 333i:a:::::: 3: 3:3::::::::::: 33xis is 33 isie: ify 3:ties x838 as: : ; ::: 3 883 & 8tective y, 33:38; 3: 388:3:3 is 3:38 saixi (:: :::::::::::: ge:8 33:8:38, 3ic:::::::ixes, Patent Application Publication May 21, 2015 Sheet 3 of 15 US 201S/O142331 A1

FGURE3

88 * iii. 8: agistie

8x

Bayesis:

Riessed

Patent Application Publication May 21, 2015 Sheet 4 of 15 US 201S/O142331 A1

FIGURE 4

Copy Number inversions ransiocations Variats

Patent Application Publication May 21, 2015 Sheet 5 of 15 US 201S/O142331 A1

FIGURE 5

Patent Application Publication May 21, 2015 Sheet 6 of 15 US 201S/O142331 A1

FIGURE 6

Patent Application Publication May 21, 2015 Sheet 7 of 15 US 201S/O142331 A1

FIGURE 7

Compete SNPs Coirector for Sp$f ( efits 8 Se:O?ics Sinances Optiatio: asia: fect kiose Eements stratificatio: Predicif 4. Nis ( Variat-eye e Stixtural Significace tests Variants Gee-eye significance tests Patent Application Publication May 21, 2015 Sheet 8 of 15 US 201S/O142331 A1

FIGURE 8

& CNY INDE SNP

x::::::::::::: ::::::::::::::::::: : x: : 8: 8:::::::::::::::::: Patent Application Publication May 21, 2015 Sheet 9 of 15 US 201S/O142331 A1

FG. 9 Patent Application Publication May 21, 2015 Sheet 10 of 15 US 201S/O142331 A1

togetPBC-op - ...... - - - , ...... v. W. W. - - - ico

3888

FG, 10 Patent Application Publication May 21, 2015 Sheet 11 of 15 US 201S/O142331 A1

togetPBiscip

F.G. 11 Patent Application Publication May 21, 2015 Sheet 12 of 15 US 201S/O142331 A1

-log (PAYS

Group A

Group B

Fertilom Gen log (PABys

F.G. 12 Patent Application Publication May 21, 2015 Sheet 13 of 15 US 201S/O142331 A1

F.G. 13

Patent Application Publication May 21, 2015 Sheet 15 of 15 US 201S/O142331 A1

Processor

Server 409

Data File Processor

Sequencer 455

Interface Module

Sequencer Computer Computer 433 451

Processor

FIG. 5 US 2015/O 142331 A1 May 21, 2015

METHODS AND DEVICES FOR ASSESSING associated biomarker, and 3) assessing risk to the patient of RISK OF FEMALE INFERTILITY developing early-onset decrease in fertility. 0006. As discussed below, an array of genetic information RELATED APPLICATIONS concerning the status of various infertility-related genetic 0001. This application is a continuation-in-part of U.S. regions is used in order to assess the risk of a Subject having Non-Provisional Ser. No. 14/107.800, filed Dec. 16, 2013, an increased susceptibility to reduced fertility, premature which claims priority to U.S. Provisional Nos. 61/889,738, menopause, or infertility. The genetic information may filed Oct. 11, 2013, and 61/737,693, filed Dec. 14, 2012. This include one or more polymorphisms in one or more infertil application also claims priority to and the benefit of U.S. ity-related genetic regions, mutations in one or more of those Provisional No. 61/932,226, filed Jan. 27, 2014. The afore genetic regions, or particular epigenetic signatures affecting mentioned applications are incorporated by reference herein. the expression of those genetic regions. The molecular con sequence of these genetic region mutations could be one or a TECHNICAL FIELD combination of the following: alternative splicing, lowered or increased RNA expression, and/or alterations in 0002 The invention generally relates to methods and expression. These alterations could also include a different devices for assessing risk of female infertility. protein product being produced. Such as one with reduced or increased activity, or a protein that elicits an abnormal immu BACKGROUND nological reaction. All of this information is significant in 0003) Approximately one in seven couples has difficulty terms of informing a patient of her susceptibility to infertility conceiving. Infertility may be due to a single cause in either or reduced fertility relative to her age or other relevant phe partner, or a combination of factors (e.g., genetic factors, notypes such as hormone levels or ovarian follicle count. diseases, or environmental factors) that may prevent a preg 0007. In addition to looking exclusively at genomic infor nancy from occurring or continuing. Every woman will mation, by combining genetic information (e.g., polymor become infertile in her lifetime due to menopause. On aver phisms, mutations, etc.) with phenotypic and/or environmen age, egg quality and number begins to decline precipitously at tal data, methods of the invention provide an additional level 35. However, some women experience this decline much of clinical clarity. For example, polymorphisms in dis earlier in life, while a number of women are fertile well into cussed below may provide information about a disposition their 40's. Though, generally, advanced maternal age (35 and toward infertility or reduced fertility. However, in certain above) is associated with poorer fertility outcomes, there is no cases, the clinical outcome may not be determinative unless way of diagnosing egg quality issues in younger Women or combined with certain phenotypic and/or environmental knowing when a particular woman will start to experience information. Thus, methods of the invention provide for a decline in her egg quality or reserve. combination of genetic predispositional analyses in combi 0004. The elucidation of the genetic basis of female infer nation with phenotypic and environmental exposure data in tility disorders permits the development of powerful, rapid, order to assess the potential for infertility or reduced fertility and non-invasive diagnostic tools that will help clinicians relative to age. Thus, in certain cases, genetic predisposition direct patients to efficient and effective treatment options. may be sufficient to make a diagnosis, but in other cases, the Additionally, the discovery of the key genes underlying these clinical outcome may not be clear based upon genetic analy disorders holds great promise for the identification of novel sis alone and the combination of genetic and phenotypic or targets for drug development and therapeutics. Finally, a bet environmental data must be used in order to assess the like ter understanding of the crucial molecular pathways under lihood of infertility or reduced fertility. lying human fertility guides the next generation of targeted, 0008. In addition to providing information to women non-hormonal contraceptives. related to the risk of infertility or reduced fertility if she chooses to try for a child at a particular age, methods of the SUMMARY invention may also be used by a physician for treatment 0005. The invention provides applications and methods purposes, e.g., allowing a physician to make Vitamin/drug for determining the identity of genetic loci biologically or recommendations to help reduce or eliminate the risk to statistically correlated with increased risk of susceptibility of early-onset reduction in fertility. For example, data herein an individual to infertility or early-onset decrease in fertility show that a mutation in the CBS affects infertility. This (premature menopause). In one aspect, the invention provides data may be used by a physician to generate a treatment plan nucleic acid sequences that can be used to assess the presence that may help remediate the infertility risk in the woman. For or absence of particular nucleotides at polymorphic sites in an example, the physician may advise the woman to take a high individual’s RNA or genomic DNA that are associated with dose offolic acid or other vitamin Supplements/drugs in order susceptibility to decreased fertility. In certain aspects, the to improve fertility. Such a treatment plan may reduce or invention provides methods for observing commonly occur eliminate the infertility risk in the woman. ring or rare genetic variants within a Subset of genes of inter 0009. A biomarker generally refers to a molecule that acts est for human infertility and risk of premature menopause. In as an indicator of a biological state. In certain embodiments, certain aspects, the invention provides methods for ranking the biomarker is a genetic region. In particular embodiments, the relative importance of individual genetic variants, genes, the genetic region is an infertility-related genetic region. Any or genetic regions for allowing determination of infertility or assay known in the art may be used to analyze the genetic premature menopause risk. In certain aspects, the invention region. In certain embodiments, the assay includes sequenc provides a method for identifying a human Subject as having ing at least a portion of the genetic region to determine pres an increased risk for infertility or premature menopause, ence or absence of a mutation that is associated with infertil including the following steps: 1) obtaining a sample from a ity. Mutations detected according to the invention may be any patient; 2) conducting an assay on at least one infertility type of genetic mutation. Exemplary mutations include a US 2015/O 142331 A1 May 21, 2015

single nucleotide polymorphism, a deletion, an insertion, an 0024 FIG. 12 illustrates a specific copy number variation inversion, other rearrangements, a copy number variation, or detected in a non-coding region of 6. a combination thereof. Any method of detecting genetic 0025 FIG. 13 illustrates population stratification correc mutations is useful with methods of the invention, and numer tion of two patient groups (ZA patients who did not get ous methods are known in the art. In certain embodiments, pregnant with IVF treatment, ZB-patients with infertility sequencing is used to determine the presence of a mutation in who did get pregnant with WF treatment). the infertility-associated genetic region. In particularly-pre 0026 FIG. 14 exemplifies a cluster analysis according to ferred embodiments, the sequencing is sequencing-by-Syn certain aspects. thesis. 0027 FIG. 15 illustrates a system for implementing meth 0010. In other embodiments, the biomarker is a gene prod ods of the invention. uct. In particular embodiments, the gene product is a product of an infertility-related gene. The gene product may be RNA DETAILED DESCRIPTION or protein. Any assay known in the art may be used to analyze the gene product. In certain embodiments, the assay involves 0028. The invention generally relates to methods and determining an amount of the gene product and comparing devices for assessing risk of Susceptibility to infertility, the determined amount to a reference. reduced fertility, or reduced fertility at a particular age includ 0011 Methods of the invention may further involve ing premature menopause. In certain embodiments, the obtaining a sample from the mammal that includes the infer invention provides methods for assessing risk of Susceptibil tility-associated biomarker. The sample may be a human tis ity to infertility or reduced fertility that involve obtaining a Sue or body fluid. In particular embodiments, the sample is biological sample, conducting an assay on at least one infer blood or saliva. Methods of the invention may also involve tility-associated biomarker, and assessing risk to of infertility enriching the sample for the infertility-associated biomarker. or reduced fertility based upon results of the assay. 0012 Methods of the invention may be used to assess the risk of infertility that is linked to an infertility-associated Samples biomarker. Another aspect of the invention provides methods 0029 Methods of the invention involve obtaining a for assessing infertility that involve obtaining a sample, con sample, e.g., a tissue or body fluid, that is suspected to include ducting an assay on at least one infertility-associated biom an infertility-associated gene or gene product. The sample arker, and assessing level of fertility based on results of the may be collected in any clinically acceptable manner. A tissue assay. is a mass of connected cells and/or extracellular matrix mate rial, e.g. skin tissue, hair, nails, endometrial tissue, nasal BRIEF DESCRIPTION OF THE DRAWINGS passage tissue, CNS tissue, neural tissue, eye tissue, liver 0013 FIG. 1 depicts the rate of decline offertility with age tissue, kidney tissue, placental tissue, mammary gland tissue, and the corresponding increase in the risk of infertility with placental tissue, gastrointestinal tissue, musculoskeletal tis age. The shades areas represent different age groups who Sue, genitourinary tissue, bone marrow, and the like, derived would benefit from a genetic screen for infertility risk (late from, for example, a human or other mammal and includes teen to mid-40s) versus a genetic screen of premature decline the connecting material and the liquid material in association in fertility (late teens to late 30's). with the cells and/or tissues. A body fluid is a liquid material 0014 FIG. 2 depicts one way that phenotypic variables derived from, for example, a human or other mammal. Such can be utilized to accelerate the discovery of genetic regions body fluids include, but are not limited to, mucous, blood, related to female infertility. plasma, serum, serum derivatives, bile, maternal blood, 0015 FIG.3 depicts the methodology for integrating clini phlegm, saliva, Sweat, amniotic fluid, menstrual fluid, cal data with genomic data to predict treatment dependent and endometrial aspirates, mammary fluid, follicular fluid of the independent fertility outcomes. ovary, fallopian tube fluid, peritoneal fluid, urine, and cere brospinal fluid (CSF), such as lumbar or ventricular CSF. A 0016 FIG. 4 depicts the different kinds of genetic variants sample may also be a fine needle aspirate or biopsied tissue. associated with risk of infertility. A sample also may be media containing cells or biological 0017 FIG.5 depicts the method for filtering through vari material. A sample may also be a blood clot, for example, a ants detected in whole genome sequencing for the identifica blood clot that has been obtained from whole blood after the tion of genetic regions related to infertility. serum has been removed. In certain embodiments, infertility 0018 FIG. 6 depicts some of the components of the Fer associated genes or gene products may be found in reproduc tilomeTM Database, a tool for correlating genetic regions with tive cells or tissues, such as gametic cells, gonadal tissue, risk for infertility (FertilomeTM Score). fertilized embryos, and placenta. In certain embodiments, the 0019 FIG. 7 is the bioinformatics pipeline used to identify sample is drawn blood or saliva. biologically interesting and statistically significant genetic 0030 Nucleic acid is extracted from the sample according variants in infertile patients. to methods known in the art. See for example, Maniatis, et al., 0020 FIG. 8 shows the different types of biologically or Molecular Cloning: A Laboratory Manual, Cold Spring Har statistically significant genetic variants that were detected in bor, N.Y., pp. 280-281, 1982, the contents of which are incor infertile patients in the MUC4 genetic region. porated by reference herein in their entirety. In certain 0021 FIG. 9 provides CGH array data of copy number embodiments, a genomic sample is collected from a subject variations associated with infertility. followed by enrichment for genetic regions or genetic frag 0022 FIG. 10 illustrates a specific copy number variation ments of interest, for example by hybridization to a nucle detected in the GJC2 gene of . otide array comprising fertility-related genes or gene frag 0023 FIG. 11 illustrates a specific copy number variation ments of interest. The sample may be enriched for genes of detected in the CRTC1 and GDF1 genes of Chromosome 19. interest (e.g., infertility-associated genes) using methods US 2015/O 142331 A1 May 21, 2015

known in the art, such as hybrid capture. See for examples, of the gene is located on an inactive X chromosome (Barr Lapidus (U.S. Pat. No. 7,666.593), the content of which is body) or on an expressed X chromosome. incorporated by reference herein in its entirety. 0034. According to certain aspects, methods of the inven 0031) RNA may be isolated from eukaryotic cells by pro tion provide for determining infertility genetic regions of cedures that involve lysis of the cells and denaturation of the interest based on data obtained from public and private fer contained therein. Tissue of interest includes tility/infertility related databases. Infertility/fertility related gametic cells, gonadal tissue, endometrial tissue, fertilized data may include implantation genes, idiopathic infertility embryos, and placenta. RNA may be isolated from fluids of genes, polycystic ovary syndrome (PCOS) genes, egg quality interest by procedures that involve denaturation of the pro genes, endometriosis genes, and premature ovarian failure teins contained therein. Fluids of interest include blood, men genes. As described below, the infertility/fertility related data strual fluid, mammary fluid, follicular fluid of the ovary, can then be processed using evolutionary conservation to peritoneal fluid, or culture medium. Additional steps may be identify genomic regions and variations of interest. employed to remove DNA. Cell lysis may be accomplished 0035) Evolutionary conservation analysis involves, gener with a nonionic detergent, followed by microcentrifugation to ally, comparing nucleic acid sequences among evolutionary remove the nuclei and hence the bulk of the cellular DNA. In and distantly related genomes to identify similarities and one embodiment, RNA is extracted from cells of the various differences between coding and/or non-coding regions across types of interest using guanidinium thiocyanate lysis fol the genomes. The similarity between a region being exam lowed by CsCl centrifugation to separate the RNA from DNA ined and the related genomes correlates to a degree of con (Chirgwin et al., Biochemistry 18:5294-5299 (1979)). Poly servation. Regions (e.g. coding, non-coding regions, and (A)+ RNA is selected by selection with oligo-dT cellulose intergenic regions flanking a gene) that maintain a high (see Sambrook et al., MOLECULAR CLONING-A degree of similarity across genomes over time are considered LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold highly conserved. Differences between the examined region Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1989). and regions of related genomes indicate that the examined Alternatively, separation of RNA from DNA can be accom region has evolved over time. If the examined region is con plished by organic extraction, for example, with hot phenol or served among related genomes, the region is generally con phenol/chloroform/isoamyl alcohol. If desired, RNase sidered to exhibit or perform functions that are important for inhibitors may be added to the lysis buffer. Likewise, for the species (i.e. functionally relevant). This is because genetic certain cell types, it may be desirable to add a protein dena abnormalities at functionally important regions are typically turation/digestion step to the protocol. harmful to the species, and are phased out over the evolution 0032 For many applications, it is desirable to preferen ary time span. Because functional elements are Subject to tially enrich mRNA with respect to other cellular RNAs, such selection, functional regions tend to evolve at slower rates as transfer RNA (tRNA) and ribosomal RNA (rRNA). Most than nonfunctional regions. A degree of conservation (e.g. mRNAs contain a poly(A)tail at their 3' end. This allows them degree of similarity between a target genomic region and to be enriched by affinity chromatography, for example, using related genomes) that is considered to be functionally rel oligo(dT) or poly(U) coupled to a solid Support, such as evant depends on the particular application. For example, a cellulose or SEPHADEX (see Ausubel et al., CURRENT functionally relevant degree of conservation may be 80%, PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96% 97%, 98%, Protocols Publishing, New York (1994). Once bound, poly 99%, etc. Regions of genes identified by evolutionary con (A)+ mRNA is eluted from the affinity column using 2 mM servation as being functionally-relevant can then be used as EDTAFO.1% SDS. regions of interest for diagnosing diseases and disorders, such as infertility. Biomarkers 0036. According to certain embodiments, infertility regions of interest are identified by performing evolutionary 0033. A biomarker generally refers to a molecule that may conservation analysis of one or more genes obtained from act as an indicator of a biological state. Biomarkers for use infertility and/or fertility-related data. The process offiltering with methods of the invention may be any marker that is through infertility/fertility related databases using evolution associated with infertility. Exemplary biomarkers include ary conservation, according to the invention, is called the genes (e.g. any region of DNA encoding a functional prod ABCoRE algorithm (see FIG. 6). For example, nucleic acid uct), genetic regions (e.g. regions including genes and inter data obtained from the infertility/fertility related databases genic regions with a particular focus on regions conserved can be compared to distantly related genomes in order to throughout evolution in placental mammals), and gene prod assess conservation of the infertility-related nucleic acid. ucts (e.g., RNA and protein). In certain embodiments, the Regions of the nucleic acid determined to be conserved are biomarker is an infertility-associated genetic region. An classified as infertility regions of interest. In one embodi infertility-associated genetic region is any DNA sequence in ment, methods of the invention assess conservation of coding which variation is associated with a change in fertility. regions to determine infertility regions of interest. In another Examples of changes in fertility include, but are not limited embodiment, methods of the invention assess conservation of to, the following: a homozygous mutation of an infertility non-coding regions to determine infertility regions of inter associated gene leads to a complete loss of fertility; a est. In further embodiments, methods of the invention assess homozygous mutation of an infertility-associated gene is conservation of intergenic regions (i.e. a non-coding region incompletely penetrant and leads to reduction in fertility that flanking a gene) to determine infertility regions of interest. In varies from individual to individual; a heterozygous mutation other embodiments, conversation of both coding and non is completely recessive, having no effect on fertility; and the coding regions is assessed to determine infertility regions of infertility-associated gene is X-linked, such that a potential interest. In any of the above embodiments, coding, non-cod defect in fertility depends on whether a non-functional allele ing, and intergenic regions may be classified as an infertility US 2015/O 142331 A1 May 21, 2015 region of interest if they have a degree of conservation of for 0041. In particular embodiments, the infertility-associ example, 80%, 85%, 90%, 91%, 92%, 93%, 94%. 95%, 96% ated genetic region is a gene (including exons, introns, and 97%, 98%, 99%, etc. evolutionarily conserved regions of DNA flanking either side 0037. In particular aspects, the following method is of said gene) that impacts fertility selected from the genes employed to determine whether a genomic region is a fertility shown in Table 1 below. In Table 1, HGNC (http://www. region of interest using conservation analysis. First, private genenames.org/) reference numbers are provided when avail and/or public nucleic acid data corresponding to infertility or able. fertility is obtained. Next, one or more genetic loci from that data is examined for conservation. The coding regions (i.e. 0042 Table 1 below depicts one possible gene ranking exons)) of a gene, non-coding regions of the gene, and/or scheme for the relative infertility, subfertility, or premature regions flanking the gene (intergenic regions upstream and decline in fertility risk associated with novel or common downstream from the gene being examined) are then ana mutations or variants in a fertility gene. The number of Vari lyzed for conservation. According to certain embodiments, if ents column corresponds to the experimental observations of the coding region is found to be conserved (e.g. a degree of these variants in a study of women with unexplained infertil conservation 90% or above), the coding region is considered ity. The most highly ranked (from top to bottom) genes in this to be an infertility region of interest. The degree of conserva list contained the most varients that were predicted to signifi tion of the non-coding region is then compared to the degree cantly affect protein structure and function (biologically sig of conservation of the coding region. If the degree of conser nificant) out of a list offertility related genes. Genetic variants Vation of the non-coding region is similar to the degree of considered to be biologically significant include mutations conversation of the coding region, then the non-coding region that result in a change: 1) to a different amino acid predicted is also classified an infertility region of interest. This degree to alter the folding and/or structure of the encoded protein, 2) of conservation comparison may also be used to determine to a different amino acid occurring at a site with high evolu whether intergenic regions flanking a gene should be classi tionarily conservation in mammals, 3) that introduces a pre fied as an infertility region of interest. mature stop termination signal. 4) that causes a stop termina 0038 Conservation of coding and/or non-coding tion signal to be lost, 5) that introduces a new start codon, 6) sequences is described in Hardison, R. C., Oeltjen, J., and that causes a start codon to be lost, 7) that disrupts a splicing Miller, W. 1997. Long human-mouse sequence alignments signal, 8) that alters the reading frame or 9) that alters the reveal novel regulatory elements: A reason to sequence the dosage of encoded protein or RNA. All genetic variants mouse genome. Genome Res. 7: 959-966; Brenner, S., Ven detected from re-sequencing exclude sites where the variant katesh, B., Yap, W. H., Chou, C. F., Tay, A., Ponniah, S., allele is detected in only one chromosome (singletons) and Wang, Y., and Tan, Y. H. 2002. Conserved regulation of the sites sequenced in only one individual. lymphocyte-specific expression of lck in the Fugu and mam mals. Proc. Natl. Acad. Sci. 99: 2936-2941: Karolchik, TABLE 1 Donna, et al. “Comparative genomic analysis using the UCSC genome browser.’ Comparative Genomics. Humana Genomic loci containing biologically significant mutations ranked Press, 2008. 17-33; Santini, Simona, Jeffrey L. Boore, and based on number of biologically significant variants observed Axel Meyer. “Evolutionary conservation of regulatory ele in a study of unexplained female infertility. ments in vertebrate Hox gene clusters.” Genome research Num 13.6a (2003): 1111-1122; Roth, F. P. Hughes, J. D., Estep, P. ber- of W., and Church, G. M. 1998. Finding DNA regulatory motifs War- Variant within unaligned noncoding sequences clustered by whole iants Description genome mRNA quantitation. Nat. Biotechnol. 16:939-945: Celmatix HGNC de- (type and and Blanchette, M. and Tompa, M. 2002. Discovery of regu Gene Gene ID ID ID tected count) latory elements by a computational method for phylogenetic footprinting. Genome Res. 12: 739-748. MUC4 CMX- 4585 7514 353 Drastic GOOOOOO6719 nonsynon 0039. In particular embodiments, the infertility-associ ymous: ated genetic region is a maternal effect gene. Maternal effects 352; Start genes are genes that have been found to encode key structures codon and functions in mammalian oocytes (Yurttas et al., Repro gained: 1 duction 139:809-823, 2010). Maternal effect genes are EPHA8 CMX- 2046 3391 23 CNW loss: described, for example in, Christians et al. (Mol Cell Biol GOOOOOOO415 23 17:778-88, 1997); Christians et al., Nature 407:693-694, LOXL4 CMX- 84171 17171 11 CNW loss: 2000); Xiao et al. (EMBO.J. 18:5943-5952, 1999); Tong et al. GOOOOO16263 11 (Endocrinology 1.45:1427-1434, 2004); Tong et al. (Nat FGF8 CMX- 2253 3686 4 CNV gain: Genet. 26:267-268, 2000); Tong et al. (Endocrinology, 140: GOOOOO16316 4 3720-3726, 1999); Tong et al. (Hum Reprod 17:903-911, KISS1R CMX- 84634 4510 4 CNV gain: 2002); Ohsugi et al. (Development 135:259-269, 2008); GOOOOO26S60 4 SCARB1 CMX- 949 1664 4 Drastic Borowczyk et al. (Proc Natl AcadSci USA., 2009); and Wu GOOOOO19991 nonsynon (Hum Reprod 24:415-424, 2009). The content of each of ymous: these is incorporated by reference herein in its entirety. 1: Start 0040. The above-described infertility genetic regions of codon interest may then be ranked according to significance using gained: 3 one or more the following ranking schemes of the invention. US 2015/O 142331 A1 May 21, 2015

TABLE 1-continued TABLE 1-continued Genomic loci containing biologically significant mutations ranked Genomic loci containing biologically significant mutations ranked based on number of biologically significant variants observed based on number of biologically significant variants observed in a study of unexplained female infertility. in a study of unexplained female infertility.

Num Num ber- of ber- of War- Variant War- Variant iants Description iants Description Celmatix HGNC de- (type and Celmatix HGNC de- (type and Gene Gene ID Entrez ID ID tected count) Gene Gene ID Entrez ID ID tected count) BARD1 CMX- S8O 952 3 Drastic CMX- 1915 31.89 2 Start codon GOOOOOO4834 nonsynon GOOOOO104.87 gained: 2 ymous: CMX- 2332 3775 2 Drastic 1: Start GOOOOO31614 nonsynon codon ymous: 1; gained: 1 Start codon Start codon gained: 1 lost: 1 GDF1 CMX- 2657 4214 2 Drastic DDX20 CMX- 11218 2743 3 Start codon GOOOOO271.83 nonsynon GOOOOOO1412 gained: 3 ymous: 1; ECHS1 CMX- 1892 3151 3 CNV gain: CNV gain: GOOOOO16594 2, CNV 1 loss: 1 HK3 CMX- 31 O1 492S 2 Drastic FMN2 CMX- 56776 14074 3 Start codon GOOOOOO9361 nonsynon GOOOOOO2910 gained: 3 ymous: 2 FOXO3 CMX- 2309 38.21 3 CNV gain: IGF2 CMX- 3481 S466 2 CNV gain: GOOOOO10672 3 GOOOOO16702 2 HS6ST1 CMX- 9394 S2O1 3 Drastic ISG15 CMX- 9636 4053 2 CNV gain: GOOOOOO4221 nonsynon GOOOOOOOO29 2 ymous: 3 JMY CMX- 133746 28916 2 Drastic MAP3K2 CMX- 10746 6854 3 CNV gain: GOOOOOO8593 nonsynon GOOOOOO42OS 3 ymous: 2 MST1 CMX- 4485 738O 3 Drastic KL CMX- 936S 6344 2 Drastic GOOOOOOS619 nonsynon GOOOOO20228 nonsynon ymous: ymous: 2 2 Splice site MTHFR CMX- 4524 7436 2 Drastic acceptor: 1 GOOOOOOO213 nonsynon MTRR CMX- 4.552 7473 3 Drastic ymous: 1; GOOOOOO8130 nonsynon Start ymous: 3 codon NLRP11 CMX- 2048O1 22945 3 Drastic gained: 1 GOOOOO28.188 nonsynon NLRP13 CMX- 1262O4 22937 2 Drastic ymous: 2; GOOOOO281.90 nonsynon Start codon ymous: 2 gained: 1 CMX- 1262O6 21269 2 Drastic NLRP14 CMX- 338323 22939 3 Drastic GOOOOO28.192 nonsynon GOOOOO16919 nonsynon ymous: 2 ymous: 3 NOBOX CMX- 135935 22448 2 Drastic NLRP8 CMX- 1262OS 22940 3 Drastic GOOOOO12690 nonsynon GOOOOO28.191 nonsynon ymous: 2 ymous: 2; PRKRA CMX- 8575 9438 2 Drastic Stop codon GOOOOOO4587 nonsynon lost: 1 ymous: 1; ASCL2 CMX- 430 739 2 Start codon Nonsynon GOOOOO16707 gained: 1 ymous CNV gain: start: 1 1 SDC3 CMX- 9672 10660 2 Drastic BMP6 CMX- 654 1073 2 CNW loss: GOOOOOOOS74 nonsynon GOOOOOO9564 2 ymous: 2 BRCA1 CMX- 672 1100 2 Drastic TACC3 CMX- 10460 11524 2 Drastic GOOOOO2S305 nonsynon GOOOOOO6818 nonsynon ymous: 2 ymous: 2 BRCA2 CMX- 675 1101 2 Drastic TLE6 CMX- 79816 3O788 2 CNW loss: GOOOOO2O222 nonsynon GOOOOO26639 2 ymous: 2 ACVR1C CMX- 130399 18123 1 Drastic CENPI CMX- 2491 3968 2 Start codon GOOOOO31175 gained: 2 GOOOOOO44O6 nonsynon COMT CMX- 1312 2228 2 Drastic ymous: 1 GOOOOO29621 nonsynon AHR CMX- 196 348 1 Start codon ymous: 1; GOOOOO11332 gained: 1 Start codon APOA1 CMX- 335 600 1 CNV gain: gained: 1 GOOOOO18327 1 CYP11B1 CMX- 1584 2591 2 CNV gain: AURKA CMX- 6790 11393 1 Start codon GOOOOO13888 2 GOOOOO28967 gained: 1 DAZL CMX- 1618 2685 2 Start codon BMP15 CMX- 9210 1068 1 CNV gain: GOOOOOOS296 gained: 2 GOOOOO3O783 1 US 2015/O 142331 A1 May 21, 2015

TABLE 1-continued TABLE 1-continued Genomic loci containing biologically significant mutations ranked Genomic loci containing biologically significant mutations ranked based on number of biologically significant variants observed based on number of biologically significant variants observed in a study of unexplained female infertility. in a study of unexplained female infertility.

Num Num ber- of ber- of War- Variant War- Variant iants Description iants Description Celmatix HGNC de- (type and Celmatix HGNC de- (type and Gene Gene ID Entrez ID ID tected count) Gene Gene ID Entrez ID ID tected count) BMP4 CMX- 652 1071 Stop codon LHCGR. CMX- 3973 6585 Drastic GOOOOO21216 lost: 1 GOOOOOO3462 nonsynon C6orf221 CMX- 154288 33699 Drastic ymous: GOOOOO10478 nonsynon MAD1L1 CMX- 8379 6762 Start codon ymous: 1 GOOOOO11200 gained: CASP8 CMX- 841 1509 CNW loss: MAD2L1 CMX- 408S 6763 Start codon GOOOOOO4721 1 GOOOOOO76SO gained: CBS CMX- 875 1550 Drastic MB21D1 CMX- 115004 21367 Drastic GOOOOO294.08 nonsynon GOOOOO10484 nonsynon ymous: 1 ymous: CDX2 CMX- O45 1806 Drastic MCM8 CMX- 8451S 16147 Drastic GOOOOO2O191 nonsynon GOOOOO284.33 nonsynon ymous: 1 ymous: CENPF CMX- O63 1857 Drastic MYC CMX- 4609 7553 Start codon GOOOOOO2670 nonsynon GOOOOO13826 gained: ymous: 1 NLRP2 CMX- 55.655 22948 Start codon CGB CMX- O82 1886 Start codon GOOOOO28140 gained: GOOOOO27860 gained: 1 NLRP4 CMX- 147945 22943 Start codon CSF1 CMX- 435 2432 CNW loss: GOOOOO28189 gained: GOOOOOO1374 1 OAS1 CMX- 4938 8086 Splice site CSF2 CMX- 437 2434 CNW loss: GOOOOO19838 acceptor: 1 GOOOOOO8885 1 PADI3 CMX- 51702 18337 CNV gain: DCTPP1 CMX- 79077 28777 CNV gain: GOOOOOOO342 1 GOOOOO23705 1 PAEP CMX- 5047 8573 CNV gain: DNMT1 CMX- 786 2976 Drastic GOOOOO15254 1 GOOOOO2688O nonsynon PLCB1 CMX- 23236 15917 CNV gain: ymous: 1 GOOOOO28445 1 EFNA4 CMX- 945 3224 CNW loss: PMS2 CMX- 5395 9122 Drastic GOOOOOO1896 1 GOOOOO11251 nonsynon EFNB3 CMX- 949 3228 CNV gain: ymous: 1 GOOOOO24616 1 POF1B CMX- 79983 13711 CNV gain: EIF3CL CMX- 728689 26347 CNW loss: GOOOOO31099 1 GOOOOO23621 1 PRDM9 CMX- S6979 13994 CNW loss: EPHAS CMX- 2044 3389 CNW loss: GOOOOOO8219 1 GOOOOOO7213 1 SEPHS2 CMX- 22928 19686 CNV gain: EPHA7 CMX- 2045 3390 CNW loss: GOOOOO23707 1 GOOOOO10603 1 SERPINA10 CMX- S11S6 15996 CNV gain: EZH2 CMX- 2146 3527 Drastic GOOOOO21629 1 GOOOOO12702 nonsynon SIRT3 CMX- 2341O 14931 CNW loss: ymous: 1 GOOOOO16629 1 FOXL2 CMX- 668 1092 Start codon SPN CMX- 101929889 11249 CNW loss: GOOOOOO6297 gained: 1 GOOOOO23664 1 FOXP3 CMX- SO943 6106 CNV gain: TFPI CMX- 7035 11760 Drastic GOOOOO3O7SO 1 GOOOOOO4632 nonsynon GALT CMX- 2592 4135 Splice site ymous: 1 GOOOOO14248 acceptor: 1 TGFB11 CMX- 7041 11767 CNV gain: GDF9 CMX- 2661 4224 Start codon GOOOOO23757 1 GOOOOOO8902 gained: 1 TP63 CMX- 8626 15979 Start codon GA4 CMX- 2701 4278 CNV gain: GOOOOOO6674 gained: 1 GOOOOOOO643 1 UBE3A CMX- 7337 12496 Start codon GJB3 CMX- 2707 428S CNV gain: GOOOOO22200 gained: 1 GOOOOOOO642 1 UBL4B CMX- 164153 32309 CNW loss: GJB4 CMX- 127534 4286 CNV gain: GOOOOOO1378 1 GOOOOOOO641 1 UIMC1 CMX- S1720 3O298 Drastic GOOOOOO9362 nonsynon GD3 CMX- 1251.11 19147 CNV gain: ymous: 1 GOOOOO2S169 1 WKORC1 CMX- 79001 23663 CNV gain: GPC3 CMX- 2719 4451 CNV gain: GOOOOO2374.1 1 GOOOOO31486 1 ZP3 CMX- 7784 1318.9 Start codon HSD17B2 CMX- 3294 S211 Drastic GOOOOO11947 gained: 1 GOOOOO24260 nonsynon ymous: 1 IGFBPL1 CMX- 347252 20081 CNW loss: GOOOOO14341 1 0043. In particular embodiments, the infertility-associ KISS1 CMX- 3814 6341 CNV gain: ated genetic region is a gene (including exons, introns, and GOOOOOO2S33 1 evolutionarily conserved regions of DNA flanking either side of said gene) that impacts fertility selected from the genes US 2015/O 142331 A1 May 21, 2015

shown in Table 2 below. In Table 2, HGNC (http://www. female infertilty based on variants detected in the coding, genenames.org/) reference numbers are provided when avail non-coding, and conserved upstream and downstream able. regions of the fertility gene. P-values.<0.025 are considered 0044) Table 2 below depicts another possible gene ranking statistically significant, and all other fertility genes did not fit scheme for the relative infertility, subfertility, or premature the pass the significance test for inclusion and ranking in this decline in fertility risk associated with novel or common list. For the gene level analysis, we first compute a gene mutations or variants in a fertility gene. Table 2 contains the variant score for the entire transcript and flanking evolution 10 genes, listed in order from most to least statistically sig arily conserved regions for each individual/gene. The gene nificant, that were determined to be statistically significantly variant score represents the variability of the gene in an indi correlated with infertility risk in a study of unexplained vidual and is computed as the sum of the proportion of variant female infertilty based on variants detected in the coding locations within that gene and its evolutionarily conserved regions of these genes. P-values.<0.025 are considered statis regions flanking the gene for that individual. A series of linear tically significant, and all other fertility genes did not fit the regression models are fit, where the outcome variable is the pass the significance test for inclusion and ranking in this list. gene variant score for a given gene, and the independent For the coding level analysis, we first compute a coding variables are group (infertile VS control) and principal com variant score for the coding regions for each individual/gene. ponent derived ethnicity (continuous). The p-value for group The coding variant score represents the variability of the gene is used for statistical inference. The model is fit once for each at coding regions in an individual and is computed as the Sum gene. of the proportion of variant locations within the coding regions of that gene for that individual. A series of linear TABLE 3 regression models are fit, where the outcome variable is the Fertility genes demonstrating statistical significance at the entire gene coding variant score for a given gene, and the independent level for infertility risk ranked based on p-values, observed in a variables are group (infertile VS control) and principal com study of unexplained female infertility. ponent derived ethnicity (continuous). The p-value for group is used for statistical inference. The model is fit once for each Gene Celmatix Gene ID Entrez ID HGNC ID P-value gene. PADI6 CMX-GOOOOOOO344 353238 20449 O.OOO795.99 CGB CMX-GOOOOO27860 1082 1886 O.OOO983714 TABLE 2 PMS2 CMX-GOOOOO11251 5395 9122 O.OO15OO248 ESR2 CMX-GOOOOO21326 2100 3468 O.OO4733531 Fertility genes demonstrating statistical significance at the gene UIMC1 CMX-GOOOOOO9362 51720 3O298 0.005170633 coding region level for infertility risk ranked based on p-values, o ZP1 CMX-GOOOOO17558 22917 131.87 O.OO852914 bserved in a study of unexplained female infertility. MDM2 CMX-GOOOOO19503 4193 6973 O.OO97947.58 BRCA2 CMX-GOOOOO2O222 675 1101 O.O19744499 Celmatix Gene TGFB1 CMX-GOOOOO27588 7040 11766 O.O2O3S8934 Gene ID Entrez ID HGNC ID P-value CDKN1C CMX-GOOOOO16717 1028 1786 O.O226OS239 TAF4B CMX-GOOOOO26229 6875 11.538 O.O24673723 ZP4 CMX- 57829 15770 S.17E-10 GOOOOOO2903 UIMC1 CMX- 51720 3O298 O.OO1401803 0047. In particular embodiments, the infertility-associ GOOOOOO9362 PADI6 CMX- 353.238 20449 O.OO342O271 ated genetic region is a gene (including exons, introns, and GOOOOOOO344 evolutionarily conserved regions of DNA flanking either side ZP1 CMX-GOOOOO17558 22917 131.87 O.OO3845858 of said gene) that impacts fertility selected from the genes MDM2 CMX- 4193 6973 O.OO932.3844 shown in Table 4 below. In Table 4, HGNC (http://www. GOOOOO19503 PRKRA CMX- 8575 9438 O.OO9832O3S genenames.org/) reference numbers are provided when avail GOOOOOO4587 able. PMS2 CMX-GOOOOO11251 5395 9122 O.O15453858 0048 Table 4 below depicts another possible gene ranking TGFB1 CMX- 7040 11766 O.O1857696.7 GOOOOO27588 scheme for the relative infertility, subfertility, or premature ESR2 CMX- 2100 3468 O.O22661688 decline in fertility risk associated with novel or common GOOOOO21326 mutations or variants in a fertility gene. Table 4 contains the PRDM1 CMX- 639 9346 O.0245221 63 top ranked 100 fertility genes, listed in order from most to GOOOOO10653 least likely for variants in that gene to affect fertility. Genes are ranked according to a Celmatix FertilomeTM Score, 0045. In particular embodiments, the infertility-associ G1 Version2, that reflects the likelihood a gene is involved in ated genetic region is a gene (including exons, introns, and fertility or reproduction. This score is computed using a data evolutionarily conserved regions of DNA flanking either side base of mined and curated data, containing attributes for each of said gene) that impacts fertility selected from the genes gene in the genome (See FIGS. 5 and 6). These attributes shown in Table 3 below. In Table 3, HGNC (http://www. include: diseases and disorders related to infertility, molecu genenames.org/) reference numbers are provided when avail lar pathways, molecular interactions, gene clusters, mouse able. phenotypes associated with each gene, gene expression data 0046 Table 3 below depicts another possible gene ranking in reproductive tissues, proteomics data in oocytes, and scheme for the relative infertility, subfertility, or premature accrued information from Scientific publications through decline in fertility risk associated with novel or common text-mining. mutations or variants in a fertility gene. Table 3 contains the 0049. The process for ranking fertility-related attributes of 11 genes, listed in order from most to least statistically sig a gene or genetic region (locus) to obtain an infertility Score is nificant, that were determined to be statistically significantly called the SESMe algorithm. The SESMe algorithm is correlated with infertility risk in a study of unexplained applied to a database of features and attributes that might US 2015/O 142331 A1 May 21, 2015 make a particular gene important for fertility. The algorithm TABLE 4-continued assigns a score and a relative weight to each feature then ranks genetic regions from most to least important (or vice versa) by List of Top 100 Hunt Fertility Genes based on the weighting features and attributes associated with that genetic Fertilome TMScore. G1Version2. region. For example, a score is assigned to a gene by compil- Celmatix ing the combined weighted values of attributes associated Gene Celmatix Entrez HGNC Fertilome TM with that gene. After each gene is scored based on its weighted Rank Symbol Gene ID Gene ID Gene ID Score attributes, the genes can be ranked in order of importance in 7 ZP2 CMX- 7783 13188 1.67 accordance with their score. The weighted value for each GOOOOO23549 infertility attribute may be scaled in any manner including 8 FSHR CMX- 2492 3969 1.37 and not limited to assigning a positive or negative integer to GOOOOOO3464 reflectflect the signifiS1gnificance or severityity of the attribute to infertilit1nfertility. 9 OOEP CMX-GOOOOO10479 441161. 21.382 1 0050. In certain embodiments, the weighted value for gene O FOXO3 CMX- 2309 3821 O.39 infertility attributes may be on a scale from -10 to +10. A+10 GOOOOO10672 may indiin 1catet h at al attriibute ute of a gene beibeing scoredd 1sis highlghly 1 ACVR1B GOOOOO1918.6CMX- 91 172 O.14 associated with infertility because that attribute is prevalently 2 CGA CMX- 1081 1885 O.04 found in infertile patient populations. A +4 may represent an GOOOOO 10560 attribute that is a latent infertility marker, meaning it will not 3 INHA Soala 3623 606S O.O2 cause infertility on its own, but may lead to infertility upon 4 LHCGR. CMX- 3973 6585 O.O1 influence of external factors such as aging and Smoking. GOOOOOO3462 Whereas +2 may representanattribute found in some infertile 5 DPPA3 CMX- 359787 1919.9 O patients but nothing directly relates the attribute to infertility. GOOOOO18719 A Zero on the scale may include an attribute not yet known to 6 KDM1B CMX-GOOOOOO9642 221656 21577 O have any effect or any negative effect towards infertility. A 7 NOBOX CMX- 135935 22448 O -10 may include an attribute shown not to affect infertility GOOOOO12690 whatsoever. Further, embodiments provide for the weighted 8 NPM2C Soiana 10361 793O O scale to include a +1 for attributes that are commonly found in 9 ESR1 CMX- 2099 3467 9.91 infertile patient populations, 0.5 for attributes similar to those GOOOOO110O2 found in infertile patient populations, and 0 for attributes 20 AURKA CMX- 679O 11393 9.84 without a causal link to infertility. GOOOOO28967 ty 21 BRCA2 CMX- 67S 1101 9.75 0051. In addition, weighted values for attributes may be GOOOOO2O222 normalized based on the known significance of that attribute 22 WT1 CMX- 7490 12796 9.53 towards infertility. For example and in certain embodiments, 23 CBS goi126 875 1550 9.49 when scoring attributes of a particular gene, each attribute GOOOOO294.08 may be assigned a 0 if the attribute is absent and a 1 if the 24 CDKN1C CMX- 1028, 1786 9.37 attribute is present. The attributes may then be normalized GOOOOO16717 based on the infertility significance of that attribute. For 25 IGF1 CMX- 3479 S464 9.35 example,le, ifI theune attributeattribul is a 9.genetic mutation known to be 26 HAND2 GOOOOO19714CMX- 94.64 4808 9.17 associated with infertility, then that attribute may be normal- GOOOOOO7954 ized by a factor of 5. In another example, if the attribute is a 27 GDF9 CMX- 2661 4224 9 signaling pathway defect sometimes associated with infertil- GOOOOOO8902 1ty,itv, then that attributattribute may bbe normalizedlized by a factor of 22. 28 MAD2L1 GOOOOOO76SOCMX- 408S 6763 9 0052 Table 4, provided below, lists 100 Human Fertility 29 ZAR1 CMX- 32634O 21436 9 Genes that were ranked by weighing attributes associated GOOOOOO7128 w1unith uneth gene 1n accordanceda W1unith methodsmeunods of theune 1nvenuon.i ti 3O FOXL2 GOOOOOO6297CMX- 668 1092 8.88 31 BARD1 CMX- 58O 952 8.54 TABLE 4 GOOOOOO4834 32 FMN2 CMX- 56,776 14074 8.4 List of Top 100 Human Fertility Genes based on the GOOOOOO2910 Fertilome TMScore. G1Version2. 33 TACC3 CMX- 1046O 11524 8.39 GOOOOOO6818 Celmatix 34 MYC CMX- 4609 7553 8.25 Gene Celmatix Entrez HGNC Fertilome TM GOOOOO13826 Rank Symbol Gene ID Gene ID Gene ID Score 35 IL11RA CMX- 3590 5967 7.9 GOOOOO14249 1 C6orf221 CMX- 154288 33699 15 36 MCM8 CMX- 8451S 16147 7.85 GOOOOO10478 GOOOOO28433 2 NLRP5 CMX- 1262O6 21269 15 37 LHB CMX- 3972 6584 7.82 GOOOOO28.192 GOOOOO27859 3 ZP3 CMX- 7784 13189 12.93 38 TAF4B CMX- 6875 11.538 7.68 GOOOOO11947 GOOOOO26229 4 FIGLA CMX- 344O18 24669 12 39 USP9X CMX- 8239 12632 7.67 GOOOOOO3616 GOOOOO30612 S PADI6 CMX- 3532.38 20449 12 40 PRLR CMX- 5618 9446 7.58 GOOOOOOO344 GOOOOOO8271 6 DNMT1 CMX- 1786 2976 11.67 41 HSF1 CMX- 3297 S224 7.35 GOOOOO2688O GOOOOO13948 US 2015/O 142331 A1 May 21, 2015

TABLE 4-continued TABLE 4-continued List of Top 100 Human Fertility Genes based on the List of Top 100 Human Fertility Genes based on the Fertilome TMScore. G1Version2. Fertilome TMScore. G1Version2.

Celmatix Celmatix Gene Celmatix Entrez HGNC Fertione TM Gene Celmatix Entrez HGNC Fertilome TM Symbol Gene ID Gene ID Gene ID Score Rank Symbol Gene ID Gene ID Gene ID Score 42 FSHB CMX 2488 3964 7.33 77 DDX43 CMX- 5551O 18677 5 GOOOOO17113 GOOOOO10483 43 ZP1 CMX 22917 131.87 7.29 78 FMR1 CMX- 2332 3775 5 GOOOOO17558 GOOOOO31614 MDM2 CMX 4193 6973 7.27 79 LIN28B CMX- 389421 322O7 5 GOOOOO19503 GOOOOO10647 45 BMP15 CMX 9210 1068 7.25 80 NLRP14 CMX- 338323 22939 5 GOOOOO3O783 GOOOOO16919 46 GPC3 CMX 2719 4451 7.11 81 NLRP4 CMX- 147945 22943 5 GOOOOO31486 GOOOOO28189 47 PRDM1 CMX 639 9346 7.05 82 NLRP7 CMX- 199713 22947 5 GOOOOO10653 GOOOOO28139 48 FST CMX 10468 3971 7 83 PROK1 CMX- 84432 8454 5 GOOOOOO8371 GOOOOOO1385 49 EZH2 CMX 2146 3527 6.91 84 SPIN1 CMX- 1927 1243 5 GOOOOO12702 GOOOOO14689 50 SMAD2 CMX 4087 6768 6.89 85 TFPI2 CMX- 798O 1761 5 GOOOOO26329 GOOOOO12044 51 NODAL CMX 4838 7865 6.88 86 ZP4 CMX- 57829 15770 5 GOOOOO15959 GOOOOOO2903 52 ACVR1 CMX 90 171 6.81 87 ESRRB CMX- 2103 3473 4.8 GOOOOOO44O7 GOOOOO21489 53 HSD17B12 CMX 51144 18646 6.71 88 UBE3A CMX- 7337 12496 4.76 GOOOOO17190 GOOOOO22200 S4 BRCA1 CMX 672 1100 6.67 89 SUZ12 CMX- 23S12 17101 4.73 GOOOOO2S3OS GOOOOO2SOO3 55 DICER1 CMX 234OS 17098 6.53 90 XIST CMX- 7503 2810 4.7 GOOOOO21645 GOOOOO31023 56 ESR2 CMX 2100 3468 6.47 91 ATM CMX- 472 795 4.62 GOOOOO21326 GOOOOO18234 57 MDM4 CMX 4.194 6974 6.42 92 AURKB CMX- 9212 1390 4.SS GOOOOOO2S42 GOOOOO24639 58 AR CMX 367 644 6.41 93 STK3 CMX- 6788 1406 4.52 GOOOOO30935 GOOOOO13673 59 SCARB1 CMX 949 1664 6.39 94 POLG CMX- S428 91.79 4.51 GOOOOO19991 GOOOOO23009 60 CDKN1B CMX 102.7 1785 6.25 95 CDX2 CMX- 1045 1806 4.46 GOOOOO18846 GOOOOO2O191 61 TP53 CMX 71.57 11998 6.23 96 TP73 CMX- 7161 12003 4.43 GOOOOO24614 GOOOOOOO110 62 NOG CMX 9241 7866 6.22 97 MTOR CMX- 2475 3942 4.42 GOOOOO2SS42 GOOOOOOO2O1 63 IL6ST CMX 3572 6021 6.13 98 AHR CMX- 196 348 4.41 GOOOOOO8398 GOOOOO11332 64 DAZL CMX 1618 2685 99 LIF CMX- 3976 6596 4.38 GOOOOOOS296 GOOOOO29949 65 NLRP11 CMX 2048O1 22945 1OO PRKRA CMX- 8575 9438 4.38 GOOOOO28.188 GOOOOOO4587 66 NLRP13 CMX 126.204 22937 GOOOOO281.90 67 NLRP8 CMX 1262OS 22940 GOOOOO28.191 0053. In particular embodiments, the infertility-associ 68 NLRP9 CMX 338321 22941 ated genetic region is a gene (including exons, introns, and GOOOOO281.84 69 ZFX CMX 7543 12869 5.67 evolutionarily conserved regions of DNA flanking either side GOOOOO3OSO3 of said gene) that impacts fertility selected from the genes 70 TFPI CMX 7035 11760 5.36 shown in Table 5 below. In Table 5, HGNC (http://www. GOOOOOO4632 genenames.org/) reference numbers are provided when avail 71 HSD17B7 CMX S1478 5215 5.32 GOOOOOO2148 able. 72 TP63 CMX 8626 15979 S.28 GOOOOOO6674 0054 Table 5 below depicts another possible gene ranking 73 NRSA1 CMX 2516 7983 5.24 scheme for the relative infertility, subfertility, or premature GOOOOO15051 decline in fertility risk associated with novel or common 74 BMP7 CMX 655 1074 S.09 mutations or variants in a fertility gene. Table 5 contains the GOOOOO28985 75 CGB CMX 1082 1886 top ranked 100 fertility genes, listed in order from most to GOOOOO27860 least likely for variants in that gene to affect fertility. Genes 76 CGBS CMX 93659 16452 are ranked according to a Celmatix FertilomeTM Score, GOOOOO27866 G1 Version3, that reflects the likelihood a gene is involved in fertility or reproduction. This score is computed using a data US 2015/O 142331 A1 May 21, 2015 10 base of mined and curated data, containing attributes for each in reproductive tissues, proteomics data in oocytes, and gene in the genome (See FIGS. 5 and 6). These attributes accrued information from Scientific publications through include: diseases and disorders related to infertility, molecu- text-mining. The Celmatix FertilomeTM Score, G1 Version3 lar pathways, molecular interactions, gene clusters, mouse differs from G1 Version2 (Table 4) because it contains more phenotypes associated with each gene, gene expression data fertility genes as an input for the score calculation. TABLE 5 List of Top 100 Human Fertility Genes based on the Fertilome TMScore, G1Version3.

Entrez Gene HGNC Celmatix Fertilome TM Rank Gene Symbol Celmatix Gene ID ID Gene ID Score 1 C6orf221 CMX-GOOOOO10478 154288 33699 5 2 NLRP5 CMX-GOOOOO28.192 1262O6 21269 5 3 TCL1A CMX-GOOOOO21654 8115 11648 4 4 ZP3 CMX-GOOOOO11947 7784 131.89 2.93 5 FIGLA CMX-GOOOOOO3616 344O18 24669 2 6 PADI6 CMX-GOOOOOOO344 353.238 20449 2 7 RSPO1 CMX-GOOOOOOO687 284.654 21679 2 8 EPHA1 CMX-GOOOOO126SO 2041 3385 1.82 9 DNMT1 CMX-GOOOOO2688O 1786 2976 1.67 10 ZP2 CMX-GOOOOO23549 7783 13188 1.67 11 MOS CMX-GOOOOO13392 4342 71.99 1.5 12 FSHR CMX-GOOOOOO3464 2492 3969 1.37 13 OOEP CMX-GOOOOO10479 441161 21.382 1 14 CUL1 CMX-GOOOOO12701 8454 2551 O.67 1S HSP90B1 CMX-GOOOOO19724 7184 12028 0.57 16 FOXO3 CMX-GOOOOO10672 2309 38.21 O.39 17 KISS1 CMX-GOOOOOO2S33 3814 6341 O.21 18 ACVR1B CMX-GOOOOO1918.6 91 172 O.14 19 CGA CMX-GOOOOO 10560 1081 1885 O.04 2O INHA CMX-GOOOOOO4914 3623 606S O.O2 21 LHCGR. CMX-GOOOOOO3462 3973 6585 O.O1 22 DPPA3 CMX-GOOOOO18719 359787 19199 O 23 KDM1B CMX-GOOOOOO9642 221 656 21577 O 24 NOBOX CMX-GOOOOO12690 135935 22448 O 25 NPM2 CMX-GOOOOO13114 10361 7930 O 26 PRMT3 CMX-GOOOOO17073 101.96 30163 O 27 GA4 CMX-GOOOOOOO643 2701 4278 9.92 28 ESR1 CMX-GOOOOO110O2 2099 3467 9.91 29 SFRP4 CMX-GOOOOO11506 6424 10778 9.89 30 AURKA CMX-GOOOOO28967 6790 11393 9.84 31 BRCA2 CMX-GOOOOO2O222 675 1101 9.75 32 WT1 CMX-GOOOOO17126 7490 12796 9.53 33 CBS CMX-GOOOOO294.08 875 1550 949 34 CDKN1C CMX-GOOOOO16717 1028 1786 9.37 35 IGF1 CMX-GOOOOO19714 3479 5464 9.35 36 PLCB1 CMX-GOOOOO28445 23236 15917 9.33 37 CEP290 CMX-GOOOOO19604 80184 29021 9.3 38 MSHS CMX-GOOOOO1OOOO 4439 7328 9.29 39 HAND2 CMX-GOOOOOO7954 9464 4808 9.17 40 GDF9 CMX-GOOOOOO8902 2661 4224 9 41 MAD2L1 CMX-GOOOOOO76SO 4085 6763 9 42 TNFAIP6 CMX-GOOOOOO4377 7130 11898 9 43 ZAR1 CMX-GOOOOOO7128 326340 20436 9 44 FOXL2 CMX-GOOOOOO6297 668 1092 8.88 45 PCNA CMX-GOOOOO28417 S111 8729 8.78 46 YBX2 CMX-GOOOOO24578 51087 17948 8.57 47 BARD1 CMX-GOOOOOO4834 S8O 952 8.57 48 AMBP CMX-GOOOOO14963 259 453 8.4 4.9 FMN2 CMX-GOOOOOO2910 56776 14074 8.4 SO NCOA2 CMX-GOOOOO13477 10499 7669 8.4 51 TEX12 CMX-GOOOOO18279 56158 11734 8.4 S2 TACC3 CMX-GOOOOOO6818 10460 11524 8.39 53 PGR CMX-GOOOOO18173 5241 8910 8.37 54 FANCC CMX-GOOOOO14774 2176 3584 8.25 55 MYC CMX-GOOOOO13826 4609 7553 8.25 56 FGF8 CMX-GOOOOO16316 2253 3686 8.23 57 SMADS CMX-GOOOOOO8943 4090 6771 8-12 58 CCS CMX-GOOOOO17793 9973 1613 8 S9 MSH4 CMX-GOOOOOO1108 44.38 7327 8 6O SPO11 CMX-GOOOOO28986 23626 112SO 8 61 SYCE1 CMX-GOOOOO16602 93426 28852 8 62 SYCP1 CMX-GOOOOOO1457 6847 11487 8 63 TFAP2C CMX-GOOOOO28982 7022 11744 8 64 WNT7A CMX-GOOOOOOS260 7476 12786 7.96 US 2015/O 142331 A1 May 21, 2015 11

TABLE 5-continued List of Top 100 Human Fertility Genes based on the Fertilome TMScore, G1Version3.

Entrez Gene HGNC Celmatix Fertilome TM Rank Gene Symbol Celmatix Gene ID ID Gene ID Score 65 IL11RA CMX-GOOOOO14249 3590 5967 7.9 66 MCM8 CMX-GOOOOO28433 84515 16147 7.85 67 SYCP2 CMX-GOOOOO29O20 10388 11490 7.85 68 INHIBA CMX-GOOOOO11SSO 3624 6066 7.83 69 MGAT1 CMX-GOOOOOO945.1 4245 7044 7.83 70 LHB CMX-GOOOOO27859 3972 6584 7.82 71 CYP19A1 CMX-GOOOOO22537 1588 2594 7.74 72 GGT1 CMX-GOOOOO29874 2678 42SO 7.71 73 TAFB4 CMX-GOOOOO26229 6875 11.538 7.68 74 SMC1B CMX-GOOOOO3O247 27127 11112 7.67 75 USP9X CMX-GOOOOO30612 8239 12632 7.67 76 PRLR CMX-GOOOOOO8271 S618 9446 7.58 77 DNMT3B CMX-GOOOOO28640 1789 2979 7.54 78 SOD1 CMX-GOOOOO29.263 6647 11179 7.54 79 SH2B1 CMX-GOOOOO23639 25970 3O417 7.5 8O HOXA11 CMX-GOOOOO11417 32O7 5101 748 81 UBB CMX-GOOOOO24729 7314 12463 7.43 82 HSF1 CMX-GOOOOO13948 3.297 5224 7.35 83 CYP17A1 CMX-GOOOOO16340 1586 2593 7.33 84 FSHB CMX-GOOOOO17113 2488 3964 7.33 85 SYCP3 CMX-GOOOOO197O6 50511 18130 7.33 86 NOS3 CMX-GOOOOO12751 4846 7876 7.31 87 ZP1 CMX-GOOOOO17558 22917 131.87 7.29 88 GNRHR CMX-GOOOOOO7221 2798 4421 7.27 89 MDM2 CMX-GOOOOO19503 4193 6973 7.27 90 BMP15 CMX-GOOOOO3O783 9210 1068 7.25 91 KDM1A CMX-GOOOOOOO422 23028 29079 7.25 92 MDK CMX-GOOOOO17221 4.192 6972 7.21 93 MSX2 CMX-GOOOOOO9331 4488 7392 7.21 94 CTNNB1 CMX-GOOOOOOS462 1499 2S1.4 7.2 95 NRIP1 CMX-GOOOOO29160 82O)4 8001 7.2 96 UBC CMX-GOOOOO19992 7316 12468 7.2 97 FKBP4 CMX-GOOOOO18615 2288 3.720 7.19 98 MLH3 CMX-GOOOOO21470 27.030 7128 7.14 99 MSX1 CMX-GOOOOOO6873 4487 7391 7.13 100 GPC3 CMX-GOOOOO31486 2719 4451 7.11

0055. In particular embodiments, the infertility-associ TABLE 6 ated genetic region is a gene (including exons, introns, and evolutionarily conserved regions of DNA flanking either side List of the Top 20 Fertility Genes (arranged in alphabetical order of said gene) that impacts fertility selected from the genes Gene Symbol Celmatix Gene ID Entrez Gene ID HGNC Gene ID shown in Table 6 below. In Table 5, HGNC (http://www. BARD1 CMX-GOOOOOO4834 S8O 952 genenames.org/) reference numbers are provided when avail C6orf221 CMX-GOOOOO10478 154288 33699 able. DNMT1 CMX-GOOOOO2688O 1786 2976 FMR1 CMX-GOOOOO31614 2332 3775 0056 Table 6 below depicts another possible gene ranking FOXO3 CMX-GOOOOO10672 2309 3821 scheme for the relative infertility, subfertility, or premature MUC4 CMX-GOOOOOO6719 458S 7514 NLRP11 CMX-GOOOOO28.188 2048O1 22945 decline in fertility risk associated with novel or common NLRP14 CMX-GOOOOO16919 338323 22939 mutations or variants in a fertility gene. Table 6 contains the NLRP5 CMX-GOOOOO28.192 1262O6 21269 NLRP8 CMX-GOOOOO28.191 1262OS 22940 top ranked fertility genes based on a comparison of how often NPM2 CMX-GOOOOO13114 10361 793O the gene appears in one of the lists above (Tables 1-5). This PADI6 CMX-GOOOOOOO344 353.238 20449 list represents the top 20 genetic regions with utility for diag PMS2 CMX-GOOOOO11251 5395 9122 SCARB1 CMX-GOOOOO19991 949 1664 nosing female infertility, subfertility, or premature decline in SPIN1 CMX-GOOOOO14689 10927 11243 fertility. These targets were identified using a compendium of TACC3 CMX-GOOOOOO6818 10460 11524 ZP1 CMX-GOOOOO17558 22917 131.87 factors: 1) Carrying statistically significant genetic mutations ZP2 CMX-GOOOOO23549 7783 13188 at the coding level in a pilot study, 2) Carrying statistically ZP3 CMX-GOOOOO11947 77.84 131.89 significant genetic mutations at the coding level in a pilot ZP4 CMX-GOOOOOO2903 57829 15770 study, 3) Carrying genetic variations in our pilot study that impact the biochemical properties of the gene, 4) Highly 0057. In particular embodiments, the infertility-associ ranked in our Celmatix FertilomeTM Score system, that ated genetic region is a gene (including exons, introns, and reflects the likelihood a gene is involved in fertility or repro evolutionarily conserved regions of DNA flanking either side duction. of said gene) that impacts fertility selected from the genes US 2015/O 142331 A1 May 21, 2015 12 shown in Table 7 below. In Table 7, HGNC (http://www. 0059 For the statistically significant variant level analy genenames.org/) reference numbers are provided when avail sis, a series of logistic regression models are fit, where the able. outcome variable is the binary indicator of variant status for a 0058 Table 7 below depicts all of the biologically and/or statistically significant variants detected in the genes depicted given location, and the independent variables are group (in in Table 6 in a genetic study of female infertility. Genetic fertile vs. control) and principal component-derived ethnicity variants considered to be biologically significant include (continuous). The p-value and odds ratio for group are used mutations that result in a change: 1) to a different amino acid for statistical inference. The model is fit once for each loca predicted to alter the folding and/or structure of the encoded tion. P-values<0.001 are considered statistically significant. protein, 2) to a different amino acid occurring at a highly We performed a SNP association study by targeted re-se evolutionarily conserved site, 3) that introduces a premature quencing and identified a total of 147 SNPs significantly stop termination signal. 4) that causes a stop termination associated with female infertility (of which 52 are reported in signal to be lost, 5) that introduces a new start codon, 6) that Table 7). Each variant was classified as novel or known. causes a start codon to be lost, 7) that disrupts a splicing Novel sites are excluded from the p-value computation. For signal, 8) that alters the reading frame or 9) that alters the known variants, we apply a series of logistic regression mod dosage of encoded protein or RNA. All genetic variants els where the outcome variable is the binary indicator of detected from resequencing exclude sites at the single nucle variant status for a given location, and the independent vari otide level where the variant allele is detected in only one ables are group (infertile vs. control) and principal compo chromosome (singletons) and sites sequenced in only one nent-derived ethnicity (continuous). The p-value and odds individual. Structural variants impacting biological function ratio for group are used for statistical inference. P-values less are also reported. Using these criteria applied to targeted than 0.001 were considered significant. Position refers to re-sequencing data from a study of infertile females, we NCBI Build 37. Alleles are reported on the forward strand. detected 490 variants, of which 379 are listed in Table 7. Ref-Reference allele, Alt=Variant allele. TABLE 7 List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIC l8le

Celmatix Gene Celmatix Variant Symbo Gene ID ID Location Ref Alt Impact P-value

APOA CMX- CMX- chr11:112553969- NA CNV APOA1 NA GOOOOO18327 V1388879 126265772 gain (3 exons) ASCL2 CMX- CMX- chr11:2234334- NA CNV ASCL2 NA GOOOOO16707 V1067111 22987O6 gain (1 exon) BARD CMX- CMX- chr2:215674224 G A. Drastic NA GOOOOOO4834 V9083698 nonsynonymous BARD CMX- CMX- chr2:21.5595.645 C T Start codon NA GOOOOOO4834 V9083699 lost BARD CMX- CMX- chr2:215674323 C G Start codon NA GOOOOOO4834 V9083700 gained BARD CMX- CMX- chr2:21564SSO2 GTGGTG. G. Codon NA GOOOOOO4834 SVOOOO1 AAGAA deletion CATTCA GGCAA BARD CMX- CMX- chr2:215742204 G T NA 6.77E-OS GOOOOOO4834 V9084177 BMP15 CMX- CMX- chrX:SO639969- NA CNV BMP15 NA GOOOOO3O783 V1250.077 SO98.1841 gain (2 exons) BMP6 CMX- CMX- chr6:7726514- NA CNV BMP6 NA GOOOOOO9564 V1247770 7727614 loss (1 exon) BMP6 CMX- CMX- chr6:7724859- NA CNV BMP6 NA GOOOOOO9564 V1166409 7728905 loss (1 exon) C6orf221 CMX- CMX- chr6:74O73531 C G Drastic NA GOOOOO10478 V90837O6 nonsynonymous CASP8 CMX- CMX- chr2:2O1851 129- NA CNV CASP8 NA GOOOOOO4721 V1843349 2O3110758 loss (2 exons) CSF1, CMX- CMX- chr1:110441465- NA CNV CSF1 NA UBL4B GOOOOOO1374, V1667025 110831379 loss (4 exons), CMX- UBL4B GOOOOOO1378 (1 exon) CSF2 CMX- CMX- chrs: 12832O218- NA CNV CSF2 NA GOOOOOO888S W1456214 13144O732 loss (4 exons) CYP11B1 CMX- CMX- chr8:143951813- NA CNV CYP11B1 NA GOOOOO13888 V1957973 143958440 gain (4 exons) CYP11B1 CMX- CMX- chr8:1439.534O3- NA CNV CYP11B1 NA GOOOOO13888 V1609269 14399.1713 gain (4 exons)

US 2015/O 142331 A1 May 21, 2015 16

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MAP3K2 CMX- CMX- cr:12752O276- NA CNV MAP3K2 NA GOOOOOO42OS V1696049 128116794 gain (16 exons) MUC4 CMX- CMX- chr3:195SOS739 C T Drastic NA GOOOOOO6719 V9083756 nonsynonymous MUC4 CMX- CMX- chr3:195SOS960 G C Drastic NA GOOOOOO6719 V9083757 nonsynonymous MUC4 CMX- CMX- chr3:195SO6089 G A. Drastic NA GOOOOOO6719 V90837S78 nonsynonymous MUC4 CMX- CMX- chr3:195SO6099 T C Drastic NA GOOOOOO6719 V90837.59 nonsynonymous MUC4 CMX- CMX- chr3:195SOS883 T C Drastic NA GOOOOOO6719 V9083760 nonsynonymous MUC4 CMX- CMX- chr3:1955O1149 C T Drastic NA GOOOOOO6719 V9083761 nonsynonymous MUC4 CMX- CMX- chr3:195SO6156 G C Drastic NA GOOOOOO6719 V9083762 nonsynonymous MUC4 CMX- CMX- chr3:195SOS897 G A. Drastic NA GOOOOOO6719 V9083763 nonsynonymous MUC4 CMX- CMX- chr3:1955O6146 A G Drastic NA GOOOOOO6719 V9083764 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6149 C T Drastic NA GOOOOOO6719 V9083765 nonsynonymous MUC4 CMX- CMX- chr3:195SO6281 A G Drastic NA GOOOOOO6719 V9083766 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6291 C T Drastic NA GOOOOOO6719 V9083767 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6302 G T Drastic NA GOOOOOO6719 V9083768 nonsynonymous MUC4 CMX- CMX- chr3:195SO624S C A. Drastic NA GOOOOOO6719 V9083769 nonsynonymous MUC4 CMX- CMX- chr3:1954.95916 G C Drastic NA GOOOOOO6719 V9083770 nonsynonymous MUC4 CMX- CMX- chr3:195S06318 C G Drastic NA GOOOOOO6719 V9083771 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6323 G C Drastic NA GOOOOOO6719 V9083772 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6339 T G Drastic NA GOOOOOO6719 V9083773 nonsynonymous MUC4 CMX- CMX- chr3:195SO 63SO G T Drastic NA GOOOOOO6719 V9083774 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6364 G C Drastic NA GOOOOOO6719 V9083775 nonsynonymous MUC4 CMX- CMX- chr3:195SO618S G A. Drastic NA GOOOOOO6719 V9083776 nonsynonymous MUC4 CMX- CMX- chr3:195SO6195 C T Drastic NA GOOOOOO6719 V9083777 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6398 G T Drastic NA GOOOOOO6719 V9083778 nonsynonymous MUC4 CMX CMX- chr3:195S06410 G A. Drastic NA GOOOOOO6719 V9083779 nonsynonymous MUC4 CMX- CMX- chr3:195SO6411 C T Drastic NA GOOOOOO6719 V908378O nonsynonymous MUC4 CMX- CMX- chr3:1955O6446 G. T Drastic NA GOOOOOO6719 V9083.781 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6460 G C Drastic NA GOOOOOO6719 V9083782 nonsynonymous MUC4 CMX- CMX- chr3:195SO)6OOS A C Drastic NA GOOOOOO6719 V9083.783 nonsynonymous MUC4 CMX- CMX- chr3:195506521 G A. Drastic NA GOOOOOO6719 V9083784 nonsynonymous MUC4 CMX- CMX- chr3:195SO6533 C A. Drastic NA GOOOOOO6719 V9083785 nonsynonymous MUC4 CMX- CMX- chr3:195SO6542 G T Drastic NA GOOOOOO6719 V9083786 nonsynonymous MUC4 CMX- CMX- chr3:195SOS788 G C Drastic NA GOOOOOO6719 V9083787 nonsynonymous MUC4 CMX- CMX- chr3:195SO6558 G C Drastic NA GOOOOOO6719 V9083788 nonsynonymous US 2015/O 142331 A1 May 21, 2015 17

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:195506590 G A. Drastic NA GOOOOOO6719 V9083.789 nonsynonymous MUC4 CMX- CMX- chr3:195506597 G A. Drastic NA GOOOOOO6719 V908.3790 nonsynonymous MUC4 CMX- CMX- chr3:195SOS906 G A. Drastic NA GOOOOOO6719 V908.3791 nonsynonymous MUC4 CMX- CMX- chr3:195SO6626 G A. Drastic NA GOOOOOO6719 V908.3792 nonsynonymous MUC4 CMX- CMX- chr3:195SO6627 T G Drastic NA GOOOOOO6719 V908.3793 nonsynonymous MUC4 CMX- CMX- chr3:195SO674.0 G. C Drastic NA GOOOOOO6719 V908.3794 nonsynonymous MUC4 CMX- CMX- chr3:195SO6746 G A. Drastic NA GOOOOOO6719 V9083.795 nonsynonymous MUC4 CMX- CMX- chr3:195SO6494 G T Drastic NA GOOOOOO6719 V908.3796 nonsynonymous MUC4 CMX- CMX- chr3:195SO67SO G C Drastic NA GOOOOOO6719 V908.3797 nonsynonymous MUC4 CMX- CMX- chr3:195506752 C T Drastic NA GOOOOOO6719 V908.3798 nonsynonymous MUC4 CMX- CMX- chr3:195506753 G C Drastic NA GOOOOOO6719 V908.3799 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6809 G T Drastic NA GOOOOOO6719 V9083800 nonsynonymous MUC4 CMX- CMX- chr3:195SO6914 G A. Drastic NA GOOOOOO6719 V90838O1 nonsynonym MUC4 CMX- CMX- chr3:195SO6917 A C Drastic NA GOOOOOO6719 V90838O2 nonsynonym MUC4 CMX- CMX- chr3:195SO6933 G A. Drastic NA GOOOOOO6719 V90838O3 nonsynonymous MUC4 CMX- CMX- chr3:195SO6940 G C Drastic NA GOOOOOO6719 V9083804 nonsynonymous MUC4 CMX- CMX- chr3:195SO6953 G A. Drastic NA GOOOOOO6719 V9083805 nonsynonymous MUC4 CMX- CMX- chr3:195SO6965 T C Drastic NA GOOOOOO6719 V9083806 nonsynonymous MUC4 CMX- CMX- chr3:195SO6966 C T Drastic NA GOOOOOO6719 V9083807 nonsynonymous MUC4 CMX- CMX- chr3:195SO697S G C Drastic NA GOOOOOO6719 V90838O8 nonsynonymous MUC4 CMX- CMX- chr3:195SO6747 C T Drastic NA GOOOOOO6719 V9083.809 nonsynonymous MUC4 CMX- CMX- chr3:195SO6986 G A. Drastic NA GOOOOOO6719 V9083810 nonsynonymous MUC4 CMX- CMX- chr3:195SO6987 T C Drastic NA GOOOOOO6719 V9083811 nonsynonymous MUC4 CMX- CMX- chr3:1955.06990 C G Drastic NA GOOOOOO6719 V9083,812 nonsynonymous MUC4 CMX- CMX- chr3:195S07010 A G Drastic NA GOOOOOO6719 V9083813 nonsynonymous MUC4 CMX- CMX- chr3:195SO7059 T C Drastic NA GOOOOOO6719 V9083814 nonsynonymous MUC4 CMX- CMX- chr3:195S07062 C T Drastic NA GOOOOOO6719 V9083815 nonsynonymous MUC4 CMX- CMX chr3:195SO 6378 C A. Drastic NA GOOOOOO6719 V9083816 nonsynonymous MUC4 CMX- CMX- chr3:195SO7083 T C Drastic NA GOOOOOO6719 V9083817 nonsynonymous MUC4 CMX- CMX- chr3:195SO7086 C G Drastic NA GOOOOOO6719 V9083818 nonsynonymous MUC4 CMX- CMX- chr3:195SO7107 C T Drastic NA GOOOOOO6719 V9083819 nonsynonymous MUC4 CMX- CMX- chr3:195SO7166 A G Drastic NA GOOOOOO6719 V908382O nonsynonymous MUC4 CMX- CMX- chr3:195SO72O3 T G Drastic NA GOOOOOO6719 V9083821 nonsynonymous MUC4 CMX- CMX- chr3:195SO7226 A G Drastic NA GOOOOOO6719 V9083,822 nonsynonymous

US 2015/O 142331 A1 May 21, 2015 19

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:1955.07694 A G Drastic NA GOOOOOO6719 V9083857 nonsynonymous MUC4 CMX- CMX- chr3:195SO7731 G A. Drastic NA GOOOOOO6719 V9083858 nonsynonymous MUC4 CMX- CMX- chr3:1955O7779 C T Drastic NA GOOOOOO6719 V9083,859 nonsynonymous MUC4 CMX- CMX- chr3:195SO7790 G A. Drastic NA GOOOOOO6719 V9083,860 nonsynonymous MUC4 CMX- CMX- chr3:195SO7827 G A. Drastic NA GOOOOOO6719 V9083,861 nonsynonymous MUC4 CMX- CMX- chr3:195474159 G A. Drastic NA GOOOOOO6719 V9083,862 nonsynonymous MUC4 CMX- CMX- chr3:195477786 C T Drastic NA GOOOOOO6719 V9083863 nonsynonymous MUC4 CMX- CMX- chr3:195489009 C A. Drastic NA GOOOOOO6719 V9083,864 nonsynonymous MUC4 CMX- CMX- chr3:195SO801.9 G. C Drastic NA GOOOOOO6719 V908386S nonsynonymous MUC4 CMX- CMX- chr3:195508021 C T Drastic NA GOOOOOO6719 V9083866 nonsynonymous MUC4 CMX- CMX- chr3:195S08069 T C Drastic NA GOOOOOO6719 V9083867 nonsynonymous MUC4 CMX- CMX- chr3:195S08070 C T Drastic NA GOOOOOO6719 V9083,868 nonsynonymous MUC4 CMX- CMX- chr3:195S08091. T C Drastic NA GOOOOOO6719 V9083.869 nonsynonymous MUC4 CMX- CMX- chr3:195SOS886 C G Drastic NA GOOOOOO6719 V9083,870 nonsynonymous MUC4 CMX- CMX- chr3:195S08115 T G Drastic NA GOOOOOO6719 V9083,871 nonsynonymous MUC4 CMX- CMX- chr3:195SO8127 G C Drastic NA GOOOOOO6719 V9083,872 nonsynonymous MUC4 CMX- CMX- chr3:195SOS907 T G Drastic NA GOOOOOO6719 V9083,873 nonsynonymous MUC4 CMX- CMX- chr3:195SOS930 C G Drastic NA GOOOOOO6719 V9083,874 nonsynonymous MUC4 CMX- CMX- chr3:195SOS955 C T Drastic NA GOOOOOO6719 V9083,875 nonsynonymous MUC4 CMX- CMX- chr3:195SO8336 C T Drastic NA GOOOOOO6719 V9083,876 nonsynonymous MUC4 CMX- CMX- chr3:195SOS979 T C Drastic NA GOOOOOO6719 V9083877 nonsynonymous MUC4 CMX- CMX- chr3:195SO8451 G T Drastic NA GOOOOOO6719 V9083878 nonsynonymous MUC4 CMX- CMX- chr3:195SO8453 C T Drastic NA GOOOOOO6719 V9083879 nonsynonymous MUC4 CMX- CMX- chr3:1955.0847S C T Drastic NA GOOOOOO6719 V908388O nonsynonymous MUC4 CMX- CMX- chr3:1955.08478 G C Drastic NA GOOOOOO6719 V9083,881 nonsynonymous MUC4 CMX- CMX- chr3:195SO8SOO G C Drastic NA GOOOOOO6719 V9083882 nonsynonymous MUC4 CMX- CMX- chr3:195SO85O1 T C Drastic NA GOOOOOO6719 V9083883 nonsynonymous MUC4 CMX- CMX- chr3:195SO85O2 C T Drastic NA GOOOOOO6719 V9083.884 nonsynonymous MUC4 CMX- CMX- chr3:195SO8523 C T Drastic NA GOOOOOO6719 V9083.885 nonsynonymous MUC4 CMX- CMX- chr3:195S08536 G C Drastic NA GOOOOOO6719 V9083886 nonsynonymous MUC4 CMX- CMX- chr3:195SO8667 T C Drastic NA GOOOOOO6719 V9083887 nonsynonymous MUC4 CMX- CMX- chr3:195SO8668 G C Drastic NA GOOOOOO6719 V9083888 nonsynonymous MUC4 CMX- CMX- chr3:195SO8702 G A. Drastic NA GOOOOOO6719 V9083889 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6311 G C Drastic NA GOOOOOO6719 V9083,890 nonsynonymous US 2015/O 142331 A1 May 21, 2015 20

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:195SO 631S T C Drastic NA GOOOOOO6719 V9083891 nonsynonymous MUC4 CMX- CMX- chr3:195SO8787 G T Drastic NA GOOOOOO6719 V9083,892 nonsynonymous MUC4 CMX- CMX- chr3:195SO8789 C T Drastic NA GOOOOOO6719 V9083,893 nonsynonymous MUC4 CMX- CMX- chr3:195SO9092 C T Drastic NA GOOOOOO6719 V9083,894 nonsynonymous MUC4 CMX- CMX- chr3:195SO9093 G A. Drastic NA GOOOOOO6719 V9083.895 nonsynonymous MUC4 CMX- CMX- chr3:195SO9099 T C Drastic NA GOOOOOO6719 V9083,896 nonsynonymous MUC4 CMX- CMX- chr3:195S09102 G C Drastic NA GOOOOOO6719 V9083897 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6389 C T Drastic NA GOOOOOO6719 V9083898 nonsynonymous MUC4 CMX- CMX- chr3:195SO9212 G A. Drastic NA GOOOOOO6719 V9083,899 nonsynonymous MUC4 CMX- CMX- chr3:195SO9287 T G Drastic NA GOOOOOO6719 V9083900 nonsynonymous MUC4 CMX- CMX- chr3:195SO9353 G A. Drastic NA GOOOOOO6719 V9083901 nonsynonymous MUC4 CMX- CMX- chr3:195SO9354 C T Drastic NA GOOOOOO6719 V9083902 nonsynonymous MUC4 CMX- CMX- chr3:195SO9363 G T Drastic NA GOOOOOO6719 V9083903 nonsynonymous MUC4 CMX- CMX- chr3:195SO936S C T Drastic NA GOOOOOO6719 V9083904 nonsynonymous MUC4 CMX- CMX- chr3:195SO9374 T G Drastic NA GOOOOOO6719 V9083.905 nonsynonymous MUC4 CMX CMX- chr3:195SO9378 G C Drastic NA GOOOOOO6719 V9083906 nonsynonymous MUC4 CMX- CMX- chr3:195SO9423 G A. Drastic NA GOOOOOO6719 V9083907 nonsynonymous MUC4 CMX- CMX- chr3:195S06SS4 G A. Drastic NA GOOOOOO6719 V9083908 nonsynonymous MUC4 CMX- CMX- chr3:195SO9563 A T Drastic NA GOOOOOO6719 V9083909 nonsynonymous MUC4 CMX- CMX- chr3:195SO9573 A G Drastic NA GOOOOOO6719 V9083910 nonsynonymous MUC4 CMX- CMX- chr3:195SO9606 C T Drastic NA GOOOOOO6719 V9083911 nonsynonymous MUC4 CMX- CMX- chr3:195SO6617 G A. Drastic NA GOOOOOO6719 V9083912 nonsynonymous MUC4 CMX- CMX- chr3:195SO9627 T C Drastic NA GOOOOOO6719 V9083.913 nonsynonymous MUC4 CMX- CMX- chr3:195SO9651 G A. Drastic NA GOOOOOO6719 V9083914 nonsynonymous MUC4 CMX- CMX- chr3:195SO9756 G C Drastic NA GOOOOOO6719 V9083915 nonsynonymous MUC4 CMX- CMX- chr3:195SO979S C T Drastic NA GOOOOOO6719 V9083916 nonsynonymous MUC4 CMX- CMX- chr3:195SO9861 A G Drastic NA GOOOOOO6719 V9083917 nonsynonymous MUC4 CMX- CMX- chr3:195SO9879 A G Drastic NA GOOOOOO6719 V9083918 nonsynonymous MUC4 CMX- CMX- chr3:195SO9918 G. C Drastic NA GOOOOOO6719 V9083919 nonsynonymous MUC4 CMX- CMX- chr3:195SO9939 G T Drastic NA GOOOOOO6719 V908392O nonsynonymous MUC4 CMX- CMX- chr3:1955O9941 A C Drastic NA GOOOOOO6719 V9083921 nonsynonymous MUC4 CMX- CMX- chr3:195SO9954 G C Drastic NA GOOOOOO6719 V9083922 nonsynonymous MUC4 CMX- CMX- chr3:195SO9957 A G Drastic NA GOOOOOO6719 V9083923 nonsynonymous MUC4 CMX- CMX- chr3:195SO9974 A G Drastic NA GOOOOOO6719 V9083924 nonsynonymous US 2015/O 142331 A1 May 21, 2015 21

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIC l8le

Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:19551 OO68 T A. Drastic NA GOOOOOO6719 V9083925 nonsynonymous MUC4 CMX- CMX- chr3:19551 0083 G T Drastic NA GOOOOOO6719 V9083926 nonsynonymous MUC4 CMX- CMX- chr3:19551 O146 G C Drastic NA GOOOOOO6719 V9083927 nonsynonymous MUC4 CMX- CMX- chr3:195510194 G C Drastic NA GOOOOOO6719 V9083928 nonsynonymous MUC4 CMX- CMX- chr3:195510590 C G Drastic NA GOOOOOO6719 V9083929 nonsynonymous MUC4 CMX- CMX- chr3:195SO 6983 G A. Drastic NA GOOOOOO6719 V9083930 nonsynonymous MUC4 CMX- CMX- chr3:1955106SS T G Drastic NA GOOOOOO6719 V9083931 nonsynonymous MUC4 CMX- CMX- chr3:195510659 T C Drastic NA GOOOOOO6719 V9083932 nonsynonymous MUC4 CMX- CMX- chr3:195510662 C T Drastic NA GOOOOOO6719 V9083933 nonsynonymous MUC4 CMX- CMX- chr3:195510683 T C Drastic NA GOOOOOO6719 V9083934 nonsynonymous MUC4 CMX- CMX- chr3:195510686 C G Drastic NA GOOOOOO6719 V9083935 nonsynonymous MUC4 CMX- CMX- chr3:195510697 G A. Drastic NA GOOOOOO6719 V9083936 nonsynonymous MUC4 CMX- CMX- chr3:19551 0706 G A. Drastic NA GOOOOOO6719 V9083937 nonsynonymous MUC4 CMX- CMX- chr3:19551 0707 T G Drastic NA GOOOOOO6719 V9083938 nonsynonymous MUC4 CMX- CMX- chr3:19551 0709 C T Drastic NA GOOOOOO6719 V9083939 nonsynonymous MUC4 CMX- CMX- chr3:19551 0718 G. T Drastic NA GOOOOOO6719 V9083940 nonsynonymous MUC4 CMX- CMX- chr3:19551 0745 G A. Drastic NA GOOOOOO6719 V9083941 nonsynonymous MUC4 CMX- CMX- chr3:19551 0749 C A. Drastic NA GOOOOOO6719 V9083942 nonsynonymous MUC4 CMX- CMX chr3:19551 0766 G T Drastic NA GOOOOOO6719 V9083943 nonsynonymous MUC4 CMX- CMX chr3:195510767 G. A. Drastic NA GOOOOOO6719 V908394.4 nonsynonymous MUC4 CMX- CMX- chr3:19551 O773 A G Drastic NA GOOOOOO6719 V9083945 nonsynonymous MUC4 CMX- CMX- chr3:195510827 C T Drastic NA GOOOOOO6719 V908394.6 nonsynonymous MUC4 CMX- CMX- chr3:195510896 G A. Drastic NA GOOOOOO6719 V9083947 nonsynonymous MUC4 CMX- CMX- chr3:195510899 T C Drastic NA GOOOOOO6719 V9083948 nonsynonymous MUC4 CMX- CMX- chr3:195510910 G T Drastic NA GOOOOOO6719 V9083949 nonsynonymous MUC4 CMX- CMX- chr3:195510943 G T Drastic NA GOOOOOO6719 V90839SO nonsynonymous MUC4 CMX- CMX- chr3:195511013 G A. Drastic NA GOOOOOO6719 V9083951 nonsynonymous MUC4 CMX- CMX- chr3:19551101.9 T C Drastic NA GOOOOOO6719 V9083952 nonsynonymous MUC4 CMX- CMX- chr3:19551 1043 T C Drastic NA GOOOOOO6719 V9083953 nonsynonymous MUC4 CMX- CMX- chr3:195511051 C A. Drastic NA GOOOOOO6719 V90839S4 nonsynonymous MUC4 CMX- CMX- chr3:195511070 C G Drastic NA GOOOOOO6719 V90839SS nonsynonymous MUC4 CMX- CMX- chr3:195511076 T A. Drastic NA GOOOOOO6719 V9083956 nonsynonymous US 2015/O 142331 A1 May 21, 2015 22

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:195511102 G A. Drastic NA GOOOOOO6719 V9083957 nonsynonymous MUC4 CMX- CMX- chr3:195511142 T C Drastic NA GOOOOOO6719 V9083958 nonsynonymous MUC4 CMX- CMX- chr3:195511156 C G Drastic NA GOOOOOO6719 V9083959 nonsynonymous MUC4 CMX- CMX- chr3:195511186 A G Drastic NA GOOOOOO6719 V9083960 nonsynonymous MUC4 CMX- CMX- chr3:195511190 C T Drastic NA GOOOOOO6719 V9083961 nonsynonymous MUC4 CMX- CMX- chr3:19551 1204 T G Drastic NA GOOOOOO6719 V9083962 nonsynonymous MUC4 CMX- CMX- chr3:195511211 C T Drastic NA GOOOOOO6719 V9083963 nonsynonymous MUC4 CMX- CMX- chr3:195511214 G C Drastic NA GOOOOOO6719 V9083964 nonsynonymous MUC4 CMX- CMX- chr3:195511268 T A. Drastic NA GOOOOOO6719 V908396S nonsynonymous MUC4 CMX- CMX- chr3:195511273 G A. Drastic NA GOOOOOO6719 V9083966 nonsynonymous MUC4 CMX- CMX- chr3:19551128S T C Drastic NA GOOOOOO6719 V9083967 nonsynonymous MUC4 CMX- CMX- chr3:195511286 T C Drastic NA GOOOOOO6719 V9083968 nonsynonymous MUC4 CMX- CMX- chr3:19551.1331 A G Drastic NA GOOOOOO6719 V9083969 nonsynonymous MUC4 CMX- CMX- chr3:195511336 G C Drastic NA GOOOOOO6719 V9083970 nonsynonymous MUC4 CMX- CMX- chr3:195511358 C G Drastic NA GOOOOOO6719 V9083971 nonsynonymous MUC4 CMX- CMX- chr3:195511390 C G Drastic NA GOOOOOO6719 V9083972 nonsynonymous MUC4 CMX- CMX- chr3:195511396 G A. Drastic NA GOOOOOO6719 V9083973 nonsynonymous MUC4 CMX- CMX- chr3:195511403 C T Drastic NA GOOOOOO6719 V9083974 nonsynonymous MUC4 CMX- CMX- chr3:195511412 T A. Drastic NA GOOOOOO6719 V9083975 nonsynonymous MUC4 CMX- CMX- chr3:195511438 G T Drastic NA GOOOOOO6719 V908.3976 nonsynonymous MUC4 CMX- CMX- chr3:195SO7683 C T Drastic NA GOOOOOO6719 V9083977 nonsynonymous MUC4 CMX- CMX- chr3:195511454 C G Drastic NA GOOOOOO6719 V9083978 nonsynonymous MUC4 CMX- CMX- chr3:195511460 T A. Drastic NA GOOOOOO6719 V908.3979 nonsynonymous MUC4 CMX- CMX- chr3:19551146S G A. Drastic NA GOOOOOO6719 V908398O nonsynonymous MUC4 CMX- CMX- chr3:195511474. A G Drastic NA GOOOOOO6719 V9083981 nonsynonymous MUC4 CMX- CMX- chr3:19551 1486 G T Drastic NA GOOOOOO6719 V9083982 nonsynonymous MUC4 CMX- CMX- chr3:195SO792S C T Drastic NA GOOOOOO6719 V9083983 nonsynonymous MUC4 CMX- CMX- chr3:195SO8009 G A. Drastic NA GOOOOOO6719 V9083984 nonsynonymous MUC4 CMX- CMX- chr3:195SO8010 C A. Drastic NA GOOOOOO6719 V9083985 nonsynonymous MUC4 CMX- CMX- chr3:195511513 G A. Drastic NA GOOOOOO6719 V9083986 nonsynonymous MUC4 CMX- CMX- chr3:195511525 T C Drastic NA GOOOOOO6719 V9083987 nonsynonymous MUC4 CMX- CMX- chr3:195511526 C T Drastic NA GOOOOOO6719 V9083988 nonsynonymous MUC4 CMX- CMX- chr3:195511534 T G Drastic NA GOOOOOO6719 V9083989 nonsynonymous MUC4 CMX- CMX- chr3:195511547 C T Drastic NA GOOOOOO6719 V9083990 nonsynonymous US 2015/O 142331 A1 May 21, 2015 23

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:195SO8108 G A. Drastic NA GOOOOOO6719 V9083991 nonsynonymous MUC4 CMX- CMX- chr3:195511690 G C Drastic NA GOOOOOO6719 V9083992 nonsynonymous MUC4 CMX- CMX- chr3:1955117OS G A. Drastic NA GOOOOOO6719 V9083993 nonsynonymous MUC4 CMX- CMX- chr3:195SO817S G C Drastic NA GOOOOOO6719 V9083994 nonsynonymous MUC4 CMX- CMX- chr3:195SO8178 G C Drastic NA GOOOOOO6719 V9083995 nonsynonymous MUC4 CMX- CMX- chr3:1955.08.238 C G Drastic NA GOOOOOO6719 V9083996 nonsynonymous MUC4 CMX- CMX- chr3:195511822 G T Drastic NA GOOOOOO6719 V9083997 nonsynonymous MUC4 CMX- CMX- chr3:1955.08402 G T Drastic NA GOOOOOO6719 V9083998 nonsynonymous MUC4 CMX- CMX- chr3:195511870 G A. Drastic NA GOOOOOO6719 V9083999 nonsynonymous MUC4 CMX- CMX- chr3:195511877 G A. Drastic NA GOOOOOO6719 V9084OOO nonsynonymous MUC4 CMX- CMX- chr3:195511918 G. T Drastic NA GOOOOOO6719 V9084001 nonsynonymous MUC4 CMX- CMX- chr3:195511925 A G Drastic NA GOOOOOO6719 V90840O2 nonsynonymous MUC4 CMX- CMX- chr3:195511937 C T Drastic NA GOOOOOO6719 V9084003 nonsynonymous MUC4 CMX- CMX- chr3:195512042 T C Drastic NA GOOOOOO6719 V9084004 nonsynonymous MUC4 CMX- CMX- chr3:195512107 T A. Drastic NA GOOOOOO6719 V9084OOS nonsynonymous MUC4 CMX- CMX- chr3:195512117 C G Drastic NA GOOOOOO6719 V9084OO6 nonsynonymous MUC4 CMX- CMX- chr3:195512195 C T Drastic NA GOOOOOO6719 V9084.007 nonsynonymous MUC4 CMX- CMX- chr3:195512206 A G Drastic NA GOOOOOO6719 V90840O8 nonsynonymous MUC4 CMX- CMX- chr3:19551 2212 G T Drastic NA GOOOOOO6719 V9084009 nonsynonymous MUC4 CMX- CMX- chr3:19551 2242 G A. Drastic NA GOOOOOO6719 V9084010 nonsynonymous MUC4 CMX- CMX- chr3:195SO8774 G T Drastic NA GOOOOOO6719 V9084O11 nonsynonymous MUC4 CMX- CMX- chr3:195S08786. A G Drastic NA GOOOOOO6719 V9084O12 nonsynonymous MUC4 CMX- CMX- chr3:195512267 T C Drastic NA GOOOOOO6719 V9084O13 nonsynonymous MUC4 CMX- CMX- chr3:19551 2270 C G Drastic NA GOOOOOO6719 V9084O14 nonsynonymous MUC4 CMX- CMX- chr3:195512287 G A. Drastic NA GOOOOOO6719 V9084O15 nonsynonymous MUC4 CMX- CMX- chr3:195512302 G A. Drastic NA GOOOOOO6719 V9084O16 nonsynonymous MUC4 CMX- CMX- chr3:195512S67 G. A. Drastic NA GOOOOOO6719 V9084O17 nonsynonymous MUC4 CMX- CMX- chr3:195512597 G A. Drastic NA GOOOOOO6719 V9084.018 nonsynonymous MUC4 CMX- CMX- chr3:195SO917O A G Drastic NA GOOOOOO6719 V9084019 nonsynonymous MUC4 CMX- CMX- chr3:195512606 G C Drastic NA GOOOOOO6719 V9084020 nonsynonymous MUC4 CMX- CMX- chr3:19551266S G A. Drastic NA GOOOOOO6719 V9084021 nonsynonymous MUC4 CMX- CMX- chr3:195512686 G T Drastic NA GOOOOOO6719 V9084022 nonsynonymous MUC4 CMX- CMX- chr3:195512693 A G Drastic NA GOOOOOO6719 V9084023 nonsynonymous MUC4 CMX- CMX- chr3:195512767 T G Drastic NA GOOOOOO6719 V9084024 nonsynonymous US 2015/O 142331 A1 May 21, 2015 24

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:195512768 T A. Drastic NA GOOOOOO6719 V9084025 nonsynonymous MUC4 CMX- CMX- chr3:19551.3136 G C Drastic NA GOOOOOO6719 V9084026 nonsynonymous MUC4 CMX- CMX- chr3:1955.13154 G T Drastic NA GOOOOOO6719 V9084027 nonsynonymous MUC4 CMX- CMX- chr3:19551.3155 T C Drastic NA GOOOOOO6719 V9084028 nonsynonymous MUC4 CMX- CMX- chr3:195SO9476 A G Drastic NA GOOOOOO6719 V9084029 nonsynonymous MUC4 CMX- CMX- chr3:1955132O3 C T Drastic NA GOOOOOO6719 V9084030 nonsynonymous MUC4 CMX- CMX- chr3:195513214 A G Drastic NA GOOOOOO6719 V9084.031 nonsynonymous MUC4 CMX- CMX- chr3:195513364 C T Drastic NA GOOOOOO6719 V9084032 nonsynonymous MUC4 CMX- CMX- chr3:195SO9614 G A. Drastic NA GOOOOOO6719 V9084033 nonsynonymous MUC4 CMX- CMX- chr3:195513383 T A. Drastic NA GOOOOOO6719 V9084034 nonsynonymous MUC4 CMX- CMX- chr3:195513394. A T Drastic NA GOOOOOO6719 V9084O3S nonsynonymous MUC4 CMX- CMX- chr3:19551339S G T Drastic NA GOOOOOO6719 V9084.036 nonsynonymous MUC4 CMX- CMX- chr3:195513397 C T Drastic NA GOOOOOO6719 V9084037 nonsynonymous MUC4 CMX- CMX- chr3:195513398 C T Drastic NA GOOOOOO6719 V9084038 nonsynonymous MUC4 CMX- CMX- chr3:1955.13413 G A. Drastic NA GOOOOOO6719 V9084039 nonsynonymous MUC4 CMX- CMX- chr3:195513433 G A. Drastic NA GOOOOOO6719 V9084040 nonsynonymous MUC4 CMX- CMX- chr3:195513442 G T Drastic NA GOOOOOO6719 V9084O41 nonsynonymous MUC4 CMX- CMX- chr3:195513445 C T Drastic NA GOOOOOO6719 V9084O42 nonsynonymous MUC4 CMX- CMX- chr3:195513461 G A. Drastic NA GOOOOOO6719 V9084O43 nonsynonymous MUC4 CMX- CMX- chr3:195513491 G T Drastic NA GOOOOOO6719 V9084O44 nonsynonymous MUC4 CMX- CMX- chr3:195513502 T G Drastic NA GOOOOOO6719 V9084O45 nonsynonymous MUC4 CMX- CMX- chr3:195513515 C T Drastic NA GOOOOOO6719 V9084046 nonsynonymous MUC4 CMX- CMX- chr3:195513598 G A. Drastic NA GOOOOOO6719 V9084O47 nonsynonymous MUC4 CMX- CMX- chr3:1955.13667 T G Drastic NA GOOOOOO6719 V9084O48 nonsynonymous MUC4 CMX- CMX- chr3:195513743 G T Drastic NA GOOOOOO6719 V9084O49 nonsynonymous MUC4 CMX- CMX- chr3:195513779 C T Drastic NA GOOOOOO6719 V9084OSO nonsynonymous MUC4 CMX- CMX- chr3:195510649 G A. Drastic NA GOOOOOO6719 V9084OS1 nonsynonymous MUC4 CMX- CMX- chr3:195513991 G A. Drastic NA GOOOOOO6719 V9084052 nonsynonymous MUC4 CMX- CMX- chr3:195514109 C A. Drastic NA GOOOOOO6719 V9084OS3 nonsynonymous MUC4 CMX- CMX- chr3:195514144 T C Drastic NA GOOOOOO6719 V9084OS4 nonsynonymous MUC4 CMX- CMX- chr3:195514324 G A. Drastic NA GOOOOOO6719 V9084OSS nonsynonymous MUC4 CMX- CMX- chr3:195514379 T C Drastic NA GOOOOOO6719 V90840S6 nonsynonymous MUC4 CMX- CMX- chr3:195514403 C T Drastic NA GOOOOOO6719 V9084.057 nonsynonymous MUC4 CMX- CMX- chr3:195514643 T G Drastic NA GOOOOOO6719 V9084058 nonsynonymous US 2015/O 142331 A1 May 21, 2015 25

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value

MUC4 CMX- CMX- chr3:195514645 T C Drastic NA GOOOOOO6719 V9084059 nonsynonymous MUC4 CMX- CMX- chr3:195514646 C T Drastic NA GOOOOOO6719 V9084O60 nonsynonymous MUC4 CMX- CMX- chr3:1955.14654. A G Drastic NA GOOOOOO6719 V9084061 nonsynonymous MUC4 CMX- CMX- chr3:195514661 A G Drastic NA GOOOOOO6719 V9084O62 nonsynonymous MUC4 CMX- CMX- chr3:195514718 G. C Drastic NA GOOOOOO6719 V9084O63 nonsynonymous MUC4 CMX- CMX- chr3:195514729 G A. Drastic NA GOOOOOO6719 V9084O64 nonsynonymous MUC4 CMX- CMX- chr3:195514733 C A. Drastic NA GOOOOOO6719 V9084O65 nonsynonymous MUC4 CMX- CMX- chr3:195514741 A C Drastic NA GOOOOOO6719 V9084O66 nonsynonymous MUC4 CMX- CMX- chr3:195514757 A G Drastic NA GOOOOOO6719 V9084O67 nonsynonymous MUC4 CMX- CMX- chr3:195514805 G. A. Drastic NA GOOOOOO6719 V9084O68 nonsynonymous MUC4 CMX- CMX- chr3:195514811 C T Drastic NA GOOOOOO6719 V9084O69 nonsynonymous MUC4 CMX- CMX- chr3:195514812 G C Drastic NA GOOOOOO6719 V9084O70 nonsynonymous MUC4 CMX- CMX- chr3:19551482S G A. Drastic NA GOOOOOO6719 V9084O71 nonsynonymous MUC4 CMX- CMX- chr3:195514846 A G Drastic NA GOOOOOO6719 V9084O72 nonsynonymous MUC4 CMX- CMX- chr3:195514859 C T Drastic NA GOOOOOO6719 V9084O73 nonsynonymous MUC4 CMX- CMX- chr3:195514862 G. C Drastic NA GOOOOOO6719 V9084O74 nonsynonymous MUC4 CMX- CMX- chr3:195514873 G A. Drastic NA GOOOOOO6719 V9084O75 nonsynonymous MUC4 CMX- CMX- chr3:195514882 G A. Drastic NA GOOOOOO6719 V9084O76 nonsynonymous MUC4 CMX- CMX- chr3:195514930 A G Drastic NA GOOOOOO6719 V9084077 nonsynonymous MUC4 CMX- CMX- chr3:195514948 G A. Drastic NA GOOOOOO6719 V9084O78 nonsynonymous MUC4 CMX- CMX- chr3:195514969 G. A. Drastic NA GOOOOOO6719 V9084O79 nonsynonymous MUC4 CMX- CMX- chr3:19551 SOO3 T C Drastic NA GOOOOOO6719 V908408O nonsynonymous MUC4 CMX- CMX- chr3:19551 SOO8 C G Drastic NA GOOOOOO6719 V9084081 nonsynonymous MUC4 CMX- CMX- chr3:19551 SO38 G A. Drastic NA GOOOOOO6719 V9084082 nonsynonymous MUC4 CMX- CMX- chr3:19551SO4S A G Drastic NA GOOOOOO6719 V9084083 nonsynonymous MUC4 CMX- CMX- chr3:195515122 G C Drastic NA GOOOOOO6719 V9084084 nonsynonymous MUC4 CMX- CMX- chr3:195515134 G T Drastic NA GOOOOOO6719 V9084085 nonsynonymous MUC4 CMX- CMX- chr3:195515141 A G Drastic NA GOOOOOO6719 V9084086 nonsynonymous MUC4 CMX- CMX- chr3:195515194 G C Drastic NA GOOOOOO6719 V9084087 nonsynonymous MUC4 CMX- CMX- chr3:195515387 T C Drastic NA GOOOOOO6719 V9084.088 nonsynonymous MUC4 CMX- CMX- chr3:195515411 G T Drastic NA GOOOOOO6719 V9084089 nonsynonymous MUC4 CMX- CMX- chr3:195515413 C T Drastic NA GOOOOOO6719 V9084O90 nonsynonymous MUC4 CMX- CMX- chr3:195515449 A T Drastic NA GOOOOOO6719 V9084.091 nonsynonymous MUC4 CMX- CMX- chr3:195515459 C T Drastic NA GOOOOOO6719 V9084O92 nonsynonymous US 2015/O 142331 A1 May 21, 2015 26

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt impact P-value MUC4 CMX- CMX- chr3:195538901 C T Start codon NA GOOOOOO6719 V9084093 gained MUC4 CMX- CMX- chr3:19551 2246 T C Drastic NA GOOOOOO6719 V9084094 nonsynonymous MUC4 CMX- CMX- chr3:195511 SS6 T A. Drastic NA GOOOOOO6719 V9084095 nonsynonymous MUC4 CMX- CMX- chr3:195512603 T C Drastic NA GOOOOOO6719 V9084096 nonsynonymous MUC4 CMX- CMX- chr3:19551.3173 G A. Drastic NA GOOOOOO6719 V9084097 nonsynonymous MUC4 CMX- CMX- chr3:195511451 T C Drastic NA GOOOOOO6719 V9084098 nonsynonymous MUC4 CMX- CMX- chr3:195511781 G A. Drastic NA GOOOOOO6719 V9084O99 nonsynonymous MUC4 CMX- CMX- chr3:195511499 C T Drastic NA GOOOOOO6719 V90841OO nonsynonymous MUC4 CMX- CMX- chr3:19551336S G A. Drastic NA GOOOOOO6719 V9084101 nonsynonymous MUC4 CMX- CMX- chr3:195511780 G A. Drastic NA GOOOOOO6719 V9084102 nonsynonymous MUC4 CMX- CMX- chr3:195513826 G A. Drastic NA GOOOOOO6719 V9084103 nonsynonymous MUC4 CMX- CMX- chr3:19551 224S T C Drastic NA GOOOOOO6719 V9084104 nonsynonymous MUC4 CMX- CMX- chr3:195511SOO G C Drastic NA GOOOOOO6719 V90841OS nonsynonymous MUC4 CMX CMX- chr3:195511 SO2 G C Drastic NA GOOOOOO6719 V9084106 nonsynonymous MUC4 CMX- CMX- chr3:195511859 T G Drastic NA GOOOOOO6719 V9084107 nonsynonymous MUC4 CMX- CMX- chr3:195511783 A G Drastic NA GOOOOOO6719 V9084.108 nonsynonymous MUC4 CMX- CMX- chr3:195512373 G GGAT Codon NA GOOOOOO6719 SWOOOO2 change and codon insertion MUC4 CMX- CMX- chr3:195518112 T TGTC Codon NA GOOOOOO6719 SWOOOO3 TCCT change and GCGT codon AACA insertion MUC4 CMX- CMX- chr3:195464985 CNV NA Splice NA GOOOOOO6719 SWOOOO4 duplication acceptor variant MUC4 CMX- CMX- chr3:195SO7809 CNV NA Nonsynonymous NA GOOOOOO6719 SWOOOOS deletion and coding Sequence MUC4 CMX- CMX- chr3:195SO8499 CNV NA Frameshift NA GOOOOOO6719 SWOOOO6 duplication MUC4 CMX- CMX- chr3:1954998.47 A G NA 6.7SE-OS GOOOOOO6719 V9084187 MUC4 CMX- CMX- chr3:195SOO367 A G NA O.OOOS32SO9 GOOOOOO6719 V9084188 MUC4 CMX- CMX- chr3:195SO67SO G C NA O.OOO425548 GOOOOOO6719 V9084.191 MUC4 CMX- CMX- chr3:195SO 6760 T A. NA 7.68E-OS GOOOOOO6719 V9084.192 MUC4 CMX- CMX- chr3:195SO6195 C T NA 8.OOE-OS GOOOOOO6719 V9084189 MUC4 CMX- CMX- chr3:195SO6746 G A. NA O.OOO150373 GOOOOOO6719 V9084190 NLRP11 CMX- CMX- chr19:56323263 G A. Drastic NA GOOOOO281.88 V90841-10 nonsynonymous NLRP11 CMX- CMX- chr19:563294.47 G A. Drastic NA GOOOOO281.88 V9084111 nonsynonymous NLRP11 CMX- CMX- chr19:56343378 C A. Start codon NA GOOOOO281.88 V90841-12 gained NLRP14 CMX- CMX- chr11:7091569 C T Drastic NA GOOOOO16919 V90841-15 nonsynonymous

US 2015/O 142331 A1 May 21, 2015 30

TABLE 7-continued List of Biologically and Statistically Significant Genetic Variants Most Useful for Predicting Infertility Risk in Humans (arranged in alphabetical order by SCIle Ilale Celmatix Gene Celmatix Variant Symbol Gene ID ID Location Ref Alt Impact P-value

NA NA CMX- chr21:44541166- NA CNV NA O.OOO257622 V2992.446 44S47084 loss NA NA CMX- chrX:100110102- NA CNV NA O.OO1445217 V2992.447 100110152 loss NA NA CMX- chrX:152934795- NA CNV NA O.OOO247877 V2992.448 152944222 loss

Description of Certain Genes mid-gestation. The expression of the DNMT1 gene is signifi cantly higher in reproductive tissues than other cell types, and 0060 Below are detailed descriptions of some of the fer is found within the set of maternal factors that are important tility genes described in the tables above. for driving egg-to-embryo transition during fertilization Re production. 2010 May: 139(5):809-23, BMC Genomics. BARD1 2009 Aug. 3; 10:348). 0061 BRCA1-Associated Ring Domain 1 (BARD1) is a gene that forms a heterodimer complex with the BRCA1 FMR1 gene, and this complex is required for spindle-pole assembly in mitosis, and hence chromosome stability. Mouse embryos 0064. FragileX Mental Retardation 1 (FMR1) encodes for carrying homozygous null alleles for BARD1 died between the RNA-binding protein FMRP that is implicated in the embryonic day 7.5 and embryonic day 8.5 due to severely fragile-X symdrome. The inhibition of translation may be a impaired cell proliferation (McCarthy et al. Molec. Cell. Biol. function of FMR1 in vivo, and that failure of mutant FMR1 23: 5056-5063, 2003). protein to oligomerize may contribute to the pathophysi ologic events leading to fragile X syndrome. Fragile X pre 0062 KH domain containing 3-like, subcortical maternal mutations in female carriers appear to be a risk factor for complex member (KHDC3L). The gene also has the identifier premature ovarian failure: 16% of the premutation carriers, “C6orf221 Entrez Geneid: 154288, HGNC id: 33699. KH menopause occurred before the age of 40, compared with domains are protein domains that binds to RNA molecules, none of the full-mutation carriers and 1 (0.4%) of the controls, and KHDC3L is likely involved in genomic imprinting, a indicating a significant association between premature meno phenomenon where genes are expressed in a parental-origin pause and premutation carrier status. Am. J. Med. Genet. 83: specific manner. KHDC3L gene expression is maximal in 322-325, 1999 germinal vesicle oocytes, tailing off through metaphase II oocytes, and its expression profile is similar to other oocyte FOXO3 specific genes Am J Hum Genet. 2011 Sep. 9; 89(3): 451 458. It is also found within the set of maternal factors that are 0065 Foxhead box O3 (FOXO3) encodes a protein that important for driving egg-to-embryo transition during fertili induces apoptosis in cells, lying within the DNA damage zation Reproduction. 2010 May: 139(5):809-23). Mice car response and repair pathways. FOXO3 knockout female mice rying homozygous null alleles for KHDC3L display a mater exhibit infertility phenotypes, in particular abnormal ovarian nal effect defect in embryogenesis with delayed embryonic follicular function. Mice mutants carrying a homozygous development and spindle abnormalities resulting in non-synonymous substitution in exon 2 of the FOXO3 gene decreased litter sizes for homozygous females. In humans, show loss offertility of sexual maturity and exhibit premature KHDC3L has been implicated in familial biparental hydatidi ovarian failures. Mammalian Genome 22: 235-248, 2011 form mole, a maternal-effect recessive inherited disorder Ref: Am J Hum Genet. 2011 Sep. 9; 89(3): 451-458 MUC4 DNMT1 0.066 MUC4 belongs to a family of high-molecular 0063 DNA (cytosine-5)-methyltransferase 1 (DNMT1) weight glycoproteins that protect and lubricate the epithelial Entrez Geneid: 1786, HGNC id: 2976, belongs to agroup of Surface of respiratory, gastrointestinal and reproductive enzymes that transfer methyl groups to position 5 of cytosine tracts. The extracellular domain can interact with an epider bases in DNA. While this process, known as DNA methyla mal growth factor receptor on the cell Surface to modulate tion, does not alter DNA base composition, it leaves “epige downstream cell growth signaling by stabilizing and/or netic' modifications to DNA molecules that affect the bio enhancing the activity of cell growth receptor complexes chemical properties of the DNA region. DNA methylation, Nature Rev. Cancer. 4(1):45-60, 2004). MUC4 is expressed mediated by DNMT1, is crucial in determining cell fate dur in the endometrial epithelium and is associated with ing embyogenesis Genes Dev. 2008 Jun. 15; 22(12):1607 endometriosis development and endometriosis-related infer 16, Dev Biol. 2002 Jan. 1; 241 (1): 172-82.). Mouse embryos tility such as embryo implantation IBMC Med. 2011 9:19, carrying homozygous null alleles for DNMT1 survive only to 2011. US 2015/O 142331 A1 May 21, 2015

NLRP11 fertilization Reproduction. 2010 May: 139(5):809-23, BMC 0067 NLR family, pyrin domain containing 11 (NLRP11) Genomics. 2009 Aug. 3; 10:348). NPM2 is associated with encodes a leucine-rich protein belonging to a large family of abnormal oocyte morphology and reduced fertility in mice, proteins likely involved in inflammation Nature Rev. Molec. and female mice homozygous null for NPM2 carry defects in Cell Biol. 4: 95-104, 2003, and is expressed in the ovary, preimplantation embryo development, with abnormalities in testes and pre-implantation embryos BMC Evol Biol. 2009 oocyte and early embryonic nuclei Science. 2003 Apr. 25; Aug. 14; 9:202. doi:10.1186/1471-2148-9-202.). NLRP11 300(5619):633-6). gene expression shows specificity to reproductive tissues. PADI6 NLRP14 (0072 Peptidylarginine Deiminase 6 (PADI6) 0068 NLR family, pyrin domain containing 14 (NLRP14) 0073 Padió was originally cloned from a 2D murine egg encodes a leucine-rich protein belonging to a large family of proteome gel based on its relative abundance, and Padió proteins likely involved in inflammation Nature Rev. Molec. expression in mice appears to be almost entirely limited to the Cell Biol. 4: 95-104, 2003, and is expressed in the ovary, oocyte and pre-implantation embryo (Yurttas et al., 2010). testes and pre-implantation embryos BMC Evol Biol. 2009 Padió is first expressed in primordial oocyte follicles and Aug. 14:9:202. doi:10.1186/1471-2148-9-202.). NPRL14 is persists, at the protein level, throughout pre-implantation also found within the set of maternal factors that are important development to the blastocyst stage (Wright et al., Dev Biol, for driving egg-to-embryo transition during fertilization Re 256:73-88, 2003). Inactivation of Padió leads to female infer production. 2010 May: 139(5):809-23, BMC Genomics. tility in mice, with the Padió-null developmental arrest occur 2009 Aug. 3; 10:348). ring at the two-cell stage (Yurttas et al., 2008). NLRP5 PMS2 0069 NLRP5 or MATER (Maternal antigen the embryos (0074 PMS2 is involved in DNA mismatch repair and require), the protein encoded by the Nlrp5 gene, is another involved in fertilization and pre-implantation development. It highly abundant oocyte protein that is essential in mouse for has been identified by knockout mouse studies as one of many embryonic development beyond the two-cell stage. MATER maternal effect genes essential for development Nature Cell was originally identified as an oocyte-specific antigen in a Bio. 4 Suppl. pp.s 41-9. mouse model of autoimmune premature ovarian failure (Tong et al., 25 Endocrinology, 140:3720-3726, 1999). MATER SCARB1 demonstrates a similar expression and Subcellular expression profile to PADI6. Like Padió-null animals, Nlrp5-null (0075 Scavenger receptor class B, member 1 (SCARB1) females exhibit normal oogenesis, ovarian development, gene encodes a glycoprotein that is a receptor for mediating oocyte maturation, ovulation and fertilization. However, cholesterol transport. SCARB1-null homozygous female embryos derived from Nlrp5-null females undergo a devel mice were infertile with dysfunctional oocytes J. Clin. opmental block at the two-cell stage and fail to exhibit normal Invest. 108: 1717-1722, 2001, hence, mutations in SCARB1 embryonic genome activation (Tong et al., Nat Genet. may affect female fertility by regulating lipoprotein metabo 26:267-268, 2000; and Tong et al. Mamm Genome 11:281 lism. 287, 2000b). SPIN1 NLRP8 0076 Spindlin 1 (SPIN1) is a gene abundantly expressed 0070 NLR family, pyrin domain containing 8 (NLRP8) in early embryo development, during the transition from encodes a leucine-rich protein belonging to a large family of oocyte to pluripotent early-embryo. SPIN1 is phosphorylated proteins likely involved in inflammation Nature Rev. Molec. in a cell-cycle dependent manner and is associated with the Cell Biol. 4: 95-104, 2003, and is expressed in the ovary, meiotic spindle Development 124: 493-503, 1997. testes and pre-implantation embryos BMC Evol Biol. 2009 Aug. 14; 9:202. doi: 10.1186/1471-2148-9-202.). NLRP8 TACC3 gene expression shows specificity to reproductive tissues. 0077. Transforming, Acidic Coiled-Coil Containing Pro NPM2 tein 3 (TACC3). In mice, TACC3 is abundantly expressed in the cytoplasm of growing oocytes, and is required for micro (0071. The gene NPM2|Entrez Gene id: 10361, HGNC id: tubule anchoring at the centroSome and for spindle assembly 7930, or nucleoplasmin 2, is a chaperon that binds to his and cell survival (Fu et al., 2010). TACC3 is also found within tones, and is involved in sperm chromatin remodeling after the set of maternal factors that are important for driving oocyte entry Nucleic Acids Res. 2012 June; 40(11): 4861 egg-to-embryo transition during fertilization Reproduction. 4878). NPM2 has been found in a screen for oocyte-specific 2010 May: 139(5):809-23, BMC Genomics. 2009 Aug. 3; genes involved in preimplantation embryonic development 10:348. Semin Reprod Med. 2007 July; 25(4):243-51), and is differ entially expressed during final oocyte maturation and early ZP1 embryonic development in humans Feral Steril. 2007 March; 87(3):677-90. NPM2 is a maternal effect gene criti 0078 Zona pellucid glycoprotein 1 (ZP1) encodes for a cal for nuclear and nucleolar organization and embryonic protein that is a structural component of the — development, and is found within the set of maternal factors an extracellular matrix that Surrounds the oocyte and early that are important for driving egg-to-embryo transition during embryo. US 2015/O 142331 A1 May 21, 2015 32

ZP2 Genomics. 2009 Aug. 3; 10:348). NPM2 is associated with abnormal oocyte morphology and reduced fertility in mice, 0079 Zona pellucid glycoprotein 2 (ZP2) encodes for a and female mice homozygous null for NPM2 carry defects in protein that is a structural component of the Zona pellucida— preimplantation embryo development, with abnormalities in an extracellular matrix that Surrounds the oocyte and early oocyte and early embryonic nuclei Science. 2003 Apr. 25; embryo. ZP2 binds to acrosome-reacted sperm and is impor 300(5619):633-6). tant in preventing polyspermy Hum Reprod. 2004 July; I0084 Oocyte-Expressed Protein (OOEP) 19(7): 1580-6.). 0085 Entrez, Gene id: 441161, HGNC id: 21382, also ZP3 goes by the identifiers KHDC2, FLOPED, HOEP19 and C6orf156. OOEP is found within the set of maternal factors 0080 Zonapellucid glycoprotein 3 (ZP3)|Entrez Geneid: that are important for driving egg-to-embryo transition during 7784, HGNC id: 13189, is a structural component of the Zona fertilization Reproduction. 2010 May: 139(5):809-23). pellucida—an extracellular matrix that Surrounds the oocyte OOEP is expressed in ovaries, but not detectable in 11 other and early embryo. It is found within the set of maternal factors cell types including male testes. Within the ovary, its expres that are important for driving egg-to-embryo transition during sion is restricted to growing oocytes. The OOEP protein fertilization BMC Genomics. 2009 Aug. 3; 10:348. ZP3 is product sublocalizes to the Subcortex of eggs and preimplan also expressed in oocytes from early ovarian development, tation embryos. OOEP homozygous null female mice have and likely to have a role in the development of primordial seemingly normal ovarian physiology and produced viable follicle before Zona pellucida formation Mol Cell Endo eggs that can be fertilized, however, these embryos do not crinol. 2008 Jul.16; 289(1-2):10-5). Female mice carring null progress beyond cleavage stage development and hence these alleles for ZP3 exhibit decreased ovary size and weight, female mice are sterile. It is believed that a functioning OOEP abnormal ovarian folliculogenesis and ovulation, ultimately is a pre-requisite for pre-implantation mouse development resulting in female infertility. Dev Cell. 2008 September, 15(3): 416–425.). I0086 Factor Located in Oocytes Permitting Embryonic Development (FLOPED/OOEP) I0087. The subcortical maternal complex (SCMC) is a 0081 Zona pellucid glycoprotein 4 (ZP4) encodes for a poorly characterized murine oocyte structure to which several protein that is a structural component of the Zona pellucida— maternal effect gene products localize (Li et al. Dev Cell an extracellular matrix that Surrounds the oocyte and early 15:416–425, 2008). PADI6, MATER, FILIA, TLE6, and embryo. ZP4 stimulates acrosome reaction as part of a sig FLOPED have been shown to localize to this complex (Liet naling pathway that involves Protein Kinase A Biol Reprod. al. Dev Cell 15:416-425, 2008: Yurttas et al. Development 2008 November; 79(5):869-77 135:2627-2636, 2008). This complex is not present in the absence of Floped and Nlrp5, and similar to embryos result DNA (Cytosine-5)-Methyltransferase 1 (DNMT1) ing from Nlrp5-depleted oocytes, embryos resulting from I0082 Entrez Geneid: 1786, HGNC id: 2976), belongs to Floped-null oocytes do not progress past the two cell stage of a group of enzymes that transfer methyl groups to position 5 mouse development (Liet al., 2008). FLOPED is a small (19 of cytosine bases in DNA. While this process, known as DNA kD) RNA binding protein that has also been characterized methylation, does not alter DNA base composition, it leaves under the name of MOEP19 (Herr et al., Dev Biol 314:300 “epigenetic' modifications to DNA molecules that affect the 316, 2008). biochemical properties of the DNA region. DNA methyla I0088 Zona Pellucid Glycoprotein 3 (ZP3) tion, mediated by DNMT1, is crucial in determining cell fate 0089 Entrez, Gene id: 7784, HGNC id: 13189, is a struc during embyogenesis Genes Dev. 2008 Jun. 15: tural component of the Zona pellucida—an extracellular 22(12):1607-16, Dev Biol. 2002 Jan. 1; 241 (1): 172-82.. matrix that Surrounds the oocyte and early embryo. It is found Mouse embryos carrying homozygous null alleles for within the set of maternal factors that are important for driv DNMT1 survive only to mid-gestation. The expression of the ing egg-to-embryo transition during fertilization BMC DNMT1 gene is significantly higher in reproductive tissues Genomics. 2009 Aug. 3; 10:348. ZP3 is also expressed in than other cell types, and is found within the set of maternal oocytes from early ovarian development, and likely to have a factors that are important for driving egg-to-embryo transi role in the development of primordial follicle before Zona tion during fertilization Reproduction. 2010 May; 139(5): pellucida formation Mol Cell Endocrinol. 2008 Jul. 16: 289 809-23, BMC Genomics. 2009 Aug. 3; 10:348. (1-2):10-5). Female mice carring null alleles for ZP3 exhibit I0083. The gene NPM2 Entrez, Geneid: 10361, HGNC id: decreased ovary size and weight, abnormal ovarian folliculo 7930, or nucleoplasmin 2, is a chaperon that binds to his genesis and ovulation, ultimately resulting in female infertil tones, and is involved in sperm chromatin remodeling after ity. oocyte entry Nucleic Acids Res. 2012 June; 40(11): 4861 (0090 FIGLA (Factor in Germline Alpha) 4878). NPM2 has been found in a screen for oocyte-specific (0091 Entrez Gene id: 344018, HGNC id:l, also goes by genes involved in preimplantation embryonic development the gene identifiers POF6, BHLHC8, and FIGALPHA. This Semin Reprod Med. 2007 July; 25(4):243-51), and is differ gene is a basic helix-loop-helix transcription factor that acts entially expressed during final oocyte maturation and early as an activator of oocyte genes. FIGLA is expressed in all embryonic development in humans Feral Steril. 2007 ovarian follicular stages and in mature oocytes, and is March; 87(3):677-90. NPM2 is a maternal effect gene criti required for normal folliculogenesis. FIGLA expression is cal for nuclear and nucleolar organization and embryonic also believed to repress genes expressed normal in male tes development, and is found within the set of maternal factors tes, and hence Sustains the female phenotype by activating that are important for driving egg-to-embryo transition during female and repressing male germ cell genetic hierarchies in fertilization Reproduction. 2010 May: 139(5):809-23, BMC growing oocytes during postnatal ovarian development Mol US 2015/O 142331 A1 May 21, 2015

Cell Biol. 2010 July;30(14. Female mice with FIGLA muta 0101 Cytosolic Phospholipase A2Y(PLA2G4C) tions result in decreased oocytes numbers and abnormal ova 0102 Under normal conditions, cELA2Y, the protein rian folliculogenesis. Heterozygous mutations in FIGLA has product of the murine PLA2G4C ortholog, expression is been implicated in women with premature ovarian failure restricted to oocytes and early embryos in mice. At the Sub Am J Hum Genet. 2008 June: 82(6):1342-8.). cellular level, cELA2Y mainly localizes to the cortical 0092 Peptidylarginine Deiminase 6 (PADI6) regions, nucleoplasm, and multivesicular aggregates of 0093. Padió was originally cloned from a 2D murine egg oocytes. It is also worth noting that while cIA2Y expression proteome gel based on its relative abundance, and Padió does appear to be mainly limited to oocytes and pre-implan expression in mice appears to be almost entirely limited to the tation embryos in healthy mice, expression is considerably oocyte and pre-implantation embryo (Yurttas et al., 2010). up-regulated within the intestinal epithelium of mice infected Padió is first expressed in primordial oocyte follicles and with Trichinella spiralis. This suggests that cFLA2Y may also persists, at the protein level, throughout pre-implantation play a role in the inflammatory response. The human development to the blastocyst stage (Wright et al., Dev Biol, PLA2G4C differs in that rather than being abundantly 256:73-88, 2003). Inactivation of Padió leads to female infer expressed in the ovary, it is abundantly expressed in the heart tility in mice, with the Padió-null developmental arrest occur and skeletal muscle. Also, the human protein contains alipase ring at the two-cell stage (Yurttas et al., 2008). consensus sequence but lacks a calcium-binding domain 0094) Maternal Antigen the Embryos Require (MATER/ found in other PLA2 enzymes. Accordingly, another cytoso NLRP5) lic phospholipase may be more relevant for human fertility. 0095 MATER, the protein encoded by the Nlrp5 gene, is 0103) Transforming, Acidic Coiled-Coil Containing Pro another highly abundant oocyte protein that is essential in for tein 3 (TACC3) embryonic development beyond the two-cell stage. MATER 0104. In mice, TACC3 is abundantly expressed in the cyto was originally identified as an oocyte-specific antigen in a plasm of growing oocytes, and is required for microtubule mouse model of autoimmune premature ovarian failure (Tong anchoring at the centroSome and for spindle assembly and et al., Endocrinology, 140:3720-3726, 1999). MATER dem cell survival (Fu et al., 2010). In certain embodiments, the onstrates a similar expression and Subcellular expression pro gene is a gene that is expressed in an oocyte. Exemplary genes file to PADI6. Like Padió-null animals, Nlrp5-null females include CTCF, ZFP57, POU5F1, SEBOX, and HDAC1. exhibit normal oogenesis, ovarian development, oocyte 0105. In other embodiments, the gene is a gene that is maturation, ovulation and fertilization. However, embryos involved in DNA repair pathways, including but not limited derived from Nlrp5-null females undergo a developmental to, MLH1, PMS1 and PMS2. In other embodiments, the gene block at the two-cell stage and fail to exhibit normal embry is BRCA1 or BRCA2. onic genome activation (Tong et al., Nat Genet. 26:267-268, 0106. In other embodiments, the biomarker is a gene prod 2000; and Tong et al. Mamm Genome 11:281-287, 2000b). uct (e.g., RNA or protein) of an infertility-associated gene. In 0096. KH Domain Containing 3-Like, Subcortical Mater particular embodiments, the gene product is a gene product of nal Complex Member (FILIA/KHDC3L) a maternal effect gene. In other embodiments, the gene prod 0097 FILIA is another small RNA-binding domain con uct is a product of a gene from Table 1. In certain embodi taining maternally inherited murine protein. FILIA was iden ments, the gene product is a product of a gene that is tified and named for its interaction with MATER (Ohsugi et expressed in an oocyte, such as a product of CTCF, ZFP57, al. Development 135:259-269, 2008). Like other components POU5F1, SEBOX, and HDAC1. In other embodiments, the of the SCMC, maternal inheritance of the Khdc3 gene prod gene product is a product of a gene that is involved in DNA uct is required for early embryonic development. In mice, loss repair pathways, such as a product of MLH1, PMS1, or of Khdc3 results in a developmental arrest of varying severity PMS2. In other embodiments, gene product is a product of with a high incidence of aneuploidy due, in part, to improper BRCA1 or BRCA2. chromosome alignment during early cleavage divisions (Liet 0107. In other embodiments, the biomarker may be an al., 2008). Khdc3 depletion also results in aneuploidy, due to epigenetic factor, such as methylation patterns (e.g., hyper spindle checkpoint assembly (SAC) inactivation, abnormal methylation of CpG islands), genomic localization or post spindle assembly, and chromosome misalignment (Zheng et translational modification of histone proteins, or general al. Proc Natl AcadSci USA 106:7473-7478, 2009). post-translational modification of proteins such as acetyla 0098 Basonuclin (BNC1) tion, ubiquitination, phosphorylation, or others. 0099 Basonuclin is a zinc finger transcription factor that 0108. In other embodiments, methods of the invention has been studied in mice. It is found expressed in kerati analyze infertility-associated biomarkers in order to assess nocytes and germ cells (male and female) and regulates rRNA the risk infertility. (via polymerase I) and mRNA (via polymerase II) synthesis 0109. In certain embodiments, the biomarker is a genetic (Iuchi and Green, 1999: Wang et al., 2006). Depending on the region, gene, or RNA/protein product of a gene associated amount by which expression is reduced in oocytes, embryos with the one carbon metabolism pathway and other pathways may not develop beyond the 8-cell stage. In Bsn1 depleted that effect methylation of cellular macromolecules. Exem mice, a normal number of oocytes are ovulated even though plary genes and products of those genes are described below. oocyte development is perturbed, but many of these oocytes 0110 Methylenetetrahydrofolate Reductase (MTHFR) cannot go on to yield viable offspring (Ma et al., 2006). 0111. In particular embodiments a mutation (677C>T) in 0100 Zygote Arrest 1 (ZAR1) Zar1 is an oocyte-specific the MTHFR gene is associated with infertility. The enzyme maternal effect gene that is known to function at the oocyte to 5,10-methylenetetrahydrofolate reductase regulates folate embryo transition in mice. High levels of Zarl expression are activity (Pavliket al., Fertility and Sterility 95(7): 2257-2262, observed in the cytoplasm of murine oocytes, and homozy 2011). The 677TT genotype is known in the art to be associ gous-null females are infertile: growing oocytes from Zarl ated with 60% reduced enzyme activity, inefficient folate null females do not progress past the two-cell stage. metabolism, decreased blood folate, elevated plasma US 2015/O 142331 A1 May 21, 2015 34 homocysteine levels, and reduced methylation capacity. Pav osterone), insulin, and ATRA (all-trans retinoic acid). They lik et al. (2011) investigated the effect of the MTHFR concluded that COMT expression in granulosa cells was up 677ODT on serum anti-Mullerian hormone (AMH) concen regulated by insulin, DHT, and ATRA. Further, 2-ME2 trations and on the numbers of oocytes retrieved (NOR) fol decreased, and COMT inhibition increased granulosa cell lowing controlled ovarian hyperstimulation (COH). Two proliferation and steroidogenesis. It was hypothesized that hundred and seventy women undergoing COH for IVF were COMT overexpression with subsequent increased level of analyzed, and their AMH levels were determined from blood 2-ME2 may lead to ovulatory dysfunction. Analyzing a samples collected after 10 days of GnRH superagonist treat sample for this mutation in the COMT gene or abnormal gene ment and before COH. Average AMH levels of TT carriers expression of products of the COMT gene allows one to were significantly higher than those of homozygous CC or assess a risk of infertility. heterozygous CT individuals. AMH serum concentrations 0115 Methionine Synthase Reductase (MTRR) correlated significantly with the NOR in all individuals stud ied. The study concluded that the MTHFR677TT genotype is 0116. In particular embodiments a mutation (A66G) in the associated with higher serum AMH concentrations but para Methionine Synthase Reductase (MTRR) gene is associated doxically has a negative effect on NOR after COH. It was with infertility. MTRR is required for the proper function of proposed that follicle maturation might be retarded in the enzyme Methionine Synthase (MTR). MTR converts MTHFR 677TT individuals, which could subsequently lead homocysteine to methionine, and MTRR activates MTR, to a higher proportion of initially recruited follicles that pro thereby regulating levels of homocysteine and methionine. duce AMH, but fail to progress towards cyclic recruitment. The maternal variant A66G has been associated with early The tissue gene expression patterns of MTHFR do not show developmental disorders such as Down's syndrome (Pozzi et any bias towards oocyte expression. Analyzing a sample for al., 2009) and Spina Bifida (Doolin et al., American journal of human genetics 71 (5): 1222-1226, 2002). Analyzing a this mutation or other mutations (Table 1) in the MTHFR sample for this mutation in the MTRR gene or abnormal gene gene or abnormal gene expression of products of the MTHFR expression of products of the MTRR gene allows one to gene allows one to assess a risk of infertility. assess the risk of infertility. 0112 Jeddi-Tehrani et al. (American Journal of Repro ductive Immunology 66(2):149-156, 2011) investigated the 0117 Betaine-Homocysteine S-Methyltransferase effect of the MTHFR 677TT genotype on Recurrant Preg (BHMT) nancy Loss (RPL). One hundred women below 35 years of 0118. In particular embodiments a mutation (G716A) in age with two successive pregnancy losses and one hundred the BHMT gene is associated with infertility. Betaine-Ho healthy women with at least two normal pregnancies were mocysteine S-Methyltransferase (BHMT), along with used to assess the frequency of five candidate genetic risk MTRR, assists in the Folate/B-12 dependent and choline/ factors for RPL-MTHFR 677C>T, MTHFR 1298A-C, betaine-dependent conversions of homocysteine to methion PARI1-675 4G/5G (Plasminogen Activator Inhibitor-1 pro ine. High homocysteine levels have been linked to female moter region), BF-455G/A (Beta Fibrinogen promoter infertility (Berker et al., Human Reproduction 24(9): 2293 region), and ITGB3 1565T/C (Integrin Beta 3). The frequen 2302, 2009). Benkhalifa et al. (2010) discuss that controlled cies of the polymorphisms were calculated and compared ovarian hyperstimulation (COH) affects homocysteine con between case and control groups. Both the MTHFR polymor centration in follicular fluid. Using germinal Vescicle oocytes phisms (677C>T and 1298 A>C) and the BF-455G/A poly from patients involved in IVF procedures, the study con morphism were found to be positively and ITGB3 1565T/C cludes that the human oocyte is able to regulate its homocys polymorphism was found to be negatively associated with teine level via remethylation using MTR and BHMT, but not RPL. Homozygosity but not heterozygosity for the PAI-1- CBS (CyStathione Beta Synthase). They further emphasize 6754G/5G polymorphism was significantly higher in patients that this may regulate the risk of imprinting problems during with RPL than in the control group. The presence of both IVF procedures. Analyzing a sample for this mutation in the mutations of MTHFR genes highly increased the risk of RPL. BHMT gene or abnormal gene expression of products of the Analyzing a sample for these mutation and other mutations BHMT gene allows one to assess a risk of infertility. (Table 1) in the MTHFR gene or abnormal gene expression of 0119 Ikeda et al. (Journal of Experimental Zoology Part products of the MTHFR gene allows one to assess a risk of A: Ecological Genetics and Physiology 313A(3): 129-136, infertility. 2010) examined the expression patterns of all methylation 0113 Catechol-O-Methyltransferase (COMT) pathway enzymes in bovine oocytes and preimplantation 0114. In particular embodiments a mutation (472G>A) in embryos. Bovine oocytes were demonstrated to have the the COMT gene is associated with infertility. Catechol-O- mRNA of MAT1A (Methionine adenosyltransferase), methyltransferase is known in the art to be one of several MAT2A, MAT2B, AHCY (S-adenosylhomocysteine hydro enzymes that inactivates catecholamine neurotransmitters by lase), MTR, BHMT, SHMT1 (Serine hydroxymethyltrans transferring a methyl group from SAM (S-adenosyl methion ferase), SHMT2, and MTHFR. All these transcripts were ine) to the catecholamine. The AA gene variant is known to consistently expressed through all the developmental stages, alter the enzyme’s thermostability and reduces its activity 3 to except MAT1A, which was not detected from the 8-cell stage 4 fold (Schmidt et al., Epidemiology 22(4): 476-485, 2011). onward, and BHMT, which was not detected in the 8-cell Salih et al. (Fertility and Sterility 89(5. Supplement 1): 1414 stage. Furthermore, the effect of exogenous homocysteine on 1421, 2008) investigated the regulation of COMT expression preimplantation development of bovine embryos was inves in granulosa cells and assessed the effects of 2-ME2 (COMT tigated in vitro. High concentrations of homocysteine product) and COMT inhibitors on DNA proliferation and induced hypermethylation of genomic DNA as well as devel steroidogenesis in JC410 porcine and HGLS human granu opmental retardation in bovine embryos. Analyzing a sample losa cell lines in in vitro experiments. They further assessed for these irregular methylation patterns allows one to assess a the regulation of COMT expression by DHT (Dihydrotest risk of infertility. US 2015/O 142331 A1 May 21, 2015

0120. Folate Receptor 2 (FOLR2) assay is conducted on infertility-associated genetic regions or 0121. In particular embodiments a mutation (rs22984.44) products of these regions. Detailed descriptions of conven in the FOLR2 gene is associated with infertility. Folate tional methods, such as those employed to make and use Receptor 2 helps transport folate (and folate derivatives) into nucleic acid arrays, amplification primers, hybridization cells. Elnakat and Ratnam (Frontiers in bioscience: a journal probes, and the like can be found in standard laboratory and virtual library 11: 506-519, 2006) implicate FOLR2, manuals such as: Genome Analysis: A Laboratory Manual along with FOLR1, in ovarian and endometrial cancers. Ana Series (Vols. I-IV), Cold Spring Harbor Laboratory Press; lyzing sample mutations in the FOLR2 or FOLR1 genes or PCR Primer: A Laboratory Manual, Cold Spring Harbor abnormal gene expression of products of the FOLR2 or Laboratory Press; and Sambrook, J et al., (2001) Molecular FOLR1 genes allows one to assess a risk of infertility. Cloning: A Laboratory Manual, 2nd ed. (Vols. 1-3), Cold 0122 Transcobalamin 2 (TCN2) Spring Harbor Laboratory Press. Custom nucleic acid arrays 0123. In particular embodiments a mutation (C776G) in are commercially available from, e.g., Affymetrix (Santa the TCN2 gene is associated with infertility. Transcobalamin Clara, Calif.), Applied Biosystems (Foster City, Calif.), and 2 facilitates transport of cobalamin (Vitamin B12) into cells. Agilent Technologies (Santa Clara, Calif.). Stanislawska-Sachadyn et al. (Eur J. ClinNutr 64(11): 1338 I0128 Methods of detecting mutations in genetic regions 1343, 2010) assessed the relationship between TCN2 are known in the art. In certain embodiments, a mutation in a 776CDG polymorphism and both serum B12 and total single infertility-associated genetic region indicates infertil homocysteine (thcy) levels. Genotypes from 613 men from ity. In other embodiments, the assay is conducted on more Northern Ireland were used to show that the TCN2 776CC than one genetic region, and a mutation in at least two of the genotype was associated with lower serum B12 concentra genetic regions indicates infertility. In other embodiments, a tions when compared to the 776CG and 776GG genotypes. mutation in at least three of the genetic regions indicates Furthermore, vitamin B12 status was shown to influence the infertility; a mutation in at least four of the genetic regions relationship between TCN2 776C>G genotype and thcy con indicates infertility; a mutation in at least five of the genetic centrations. The TCN2776CDG polymorphism may contrib regions indicates infertility; a mutation in at least six of the ute to the risk of pathologies associated with low B12 and genetic regions indicates infertility; a mutation in at least high total homocysteine phenotype. Analyzing a sample for seven of the genetic regions indicates infertility; a mutation in this mutation in the TCN2 gene or abnormal gene expression at least eight of the genetic regions indicates infertility; a of products of the TCN2 gene allows one to assess a risk of mutation in at least nine of the genetic regions indicates infertility. infertility; a mutation in at least 10 of the genetic regions 0.124 Cystathionine-Beta-Synthase (CBS) indicates infertility; a mutation in at least 15 of the genetic 0.125. In particular embodiments a mutation (rs234715) in regions indicates infertility; or a mutation in all of the genetic the CBS gene is associated with infertility. With vitamin B6 as regions from Table 1 indicates infertility. a cofactor, the Cystathionine-Beta-Synthase (CBS) enzyme I0129. In certain embodiments, a known single nucleotide catalyzes a reaction that permanently removes homocysteine polymorphism at a particular position can be detected by from the methionine pathway by diverting it to the transsul single base extension for a primer that binds to the sample furation pathway. CBS gene mutations associated with DNA adjacent to that position. See for example Shuber et al. decreased CBS activity also lead to elevated plasma (U.S. Pat. No. 6,566,101), the content of which is incorpo homocysteine levels. Guzman et al. (2006) demonstrate that rated by reference herein in its entirety. In other embodi Cbs knockout mice are infertile. They further explain that ments, a hybridization probe might be employed that overlaps Cbs-null female infertility is a consequence of uterine failure, the SNP of interest and selectively hybridizes to sample which is a consequence of hyperhomocysteinemia or other nucleic acids containing a particular nucleotide at that posi factor(s) in the uterine environment. Analyzing a sample for tion. See for example Shuber et al. (U.S. Pat. Nos. 6,214,558 this mutation in the CBS gene or abnormal gene expression of and 6.300.077), the content of which is incorporated by ref products of the CBS gene allows one to assess a risk of erence herein in its entirety. infertility. 0.130. In particular embodiments, nucleic acids are 0126. In certain embodiments, the biomarker is a genetic sequenced in order to detect variants (i.e., mutations) in the region that has been previously associated with female infer nucleic acid compared to wild-type and/or non-mutated tility. A SNP association study by targeted re-sequencing was forms of the sequence. The nucleic acid can include a plural performed to search for new genetic variants associated with ity of nucleic acids derived from a plurality of genetic ele female infertility. Such methods have been successful in iden ments. Methods of detecting sequence variants are known in tifying significant variants associated in a wide range of dis the art, and sequence variants can be detected by any sequenc eases Rehman et al., 2010; Walsh et al., 2010). Briefly, a SNP ing method known in the art e.g., ensemble sequencing or association study is performed by collecting SNPs in genetic single molecule sequencing. regions of interest in a number of samples and controls and I0131 Sequencing may be by any method known in the art. then testing each of the SNPs that showed significant fre DNA sequencing techniques include classic dideoxy quency differences between cases and controls. Significant sequencing reactions (Sanger method) using labeled termi frequency differences between cases and controls indicate nators or primers and gel separation in slab or capillary, that the SNP is associated with the condition of interest. sequencing by synthesis using reversibly terminated labeled nucleotides, pyrosequencing, 454 sequencing, allele specific Assays hybridization to a library of labeled oligonucleotide probes, 0127. Methods of the invention involve conducting an sequencing by synthesis using allele specific hybridization to assay that detects either a mutation in an infertility-associated a library of labeled clones that is followed by ligation, real gene or abnormal expression (over or under) of an infertility time monitoring of the incorporation of labeled nucleotides associated gene product. In particular embodiments, the during a polymerization step, polony sequencing, and SOLiD US 2015/O 142331 A1 May 21, 2015 36 sequencing. Sequencing of separated molecules has more the beads are captured in wells (pico-liter sized). Pyrose recently been demonstrated by sequential or single extension quencing is performed on each DNA fragment in parallel. reactions using polymerases or ligases as well as by single or Addition of one or more nucleotides generates a light signal sequential differential hybridizations with libraries of probes. that is recorded by a CCD camera in a sequencing instrument. 0.132. One conventional method to perform sequencing is The signal strength is proportional to the number of nucle by chain termination and gel separation, as described by otides incorporated. Pyrosequencing makes use of pyrophos Sanger et al., Proc Natl. Acad. Sci. USA, 74(12): 5463 67 phate (PPi) which is released upon nucleotide addition. PPi is (1977). Another conventional sequencing method involves converted to ATP by ATP sulfurylase in the presence of chemical degradation of nucleic acid fragments. See, Maxam adenosine 5' phosphosulfate. Luciferase uses ATP to convert et al., Proc. Natl. Acad. Sci., 74: 560 564 (1977). Methods luciferinto oxyluciferin, and this reaction generates light that have also been developed based upon sequencing by hybrid is detected and analyzed. ization. See, e.g., Harris et al., (U.S. patent application num 0.135 Another example of a DNA sequencing technique ber 2009/0156412). The content of each reference is incor that can be used in the methods of the provided invention is porated by reference herein in its entirety. SOLiD technology (Applied Biosystems). In SOLiD 0133) A sequencing technique that can be used in the sequencing, genomic DNA is sheared into fragments, and methods of the provided invention includes, for example, adaptors are attached to the 5' and 3' ends of the fragments to Helicos True Single Molecule Sequencing (tSMS) (Harris T. generate a fragment library. Alternatively, internal adaptors D. etal. (2008) Science 320:106-109). In the tSMS technique, can be introduced by ligating adaptors to the 5' and 3' ends of a DNA sample is cleaved into strands of approximately 100 to the fragments, circularizing the fragments, digesting the cir 200 nucleotides, and a polyA sequence is added to the 3' end cularized fragment to generate an internal adaptor, and of each DNA strand. Each strand is labeled by the addition of attaching adaptors to the 5' and 3' ends of the resulting frag a fluorescently labeled adenosine nucleotide. The DNA ments to generate a mate-paired library. Next, clonal bead strands are then hybridized to a flow cell, which contains populations are prepared in microreactors containing beads, millions of oligo-T capture sites that are immobilized to the primers, template, and PCR components. Following PCR, the flow cell surface. The templates can be at a density of about templates are denatured and beads are enriched to separate the 100 million templates/cm. The flow cell is then loaded into beads with extended templates. Templates on the selected an instrument, e.g., HeliScope M sequencer, and a laser illu beads are subjected to a 3' modification that permits bonding minates the surface of the flow cell, revealing the position of to a glass slide. The sequence can be determined by sequential each template. A CCD camera can map the position of the hybridization and ligation of partially random oligonucle templates on the flow cell surface. The template fluorescent otides with a central determined base (or pair of bases) that is label is then cleaved and washed away. The sequencing reac identified by a specific fluorophore. After a color is recorded, tion begins by introducing a DNA polymerase and a fluores the ligated oligonucleotide is cleaved and removed and the cently labeled nucleotide. The oligo-T nucleic acid serves as process is then repeated. a primer. The polymerase incorporates the labeled nucle 0.136 Another example of a DNA sequencing technique otides to the primer in a template directed manner. The poly that can be used in the methods of the provided invention is merase and unincorporated nucleotides are removed. The Ion Torrent sequencing (U.S. patent application numbers templates that have directed incorporation of the fluores 2009/0026082, 2009/0127589, 2010/0035252, 2010/ cently labeled nucleotide are detected by imaging the flow 0.1371.43, 2010/0188073, 2010/0197507, 2010/0282617, cell Surface. After imaging, a cleavage step removes the fluo 2010/0300559), 2010/0300895, 2010/0301398, and 2010/ rescent label, and the process is repeated with other fluores 0304982), the content of each of which is incorporated by cently labeled nucleotides until the desired read length is reference herein in its entirety. In Ion Torrent sequencing, achieved. Sequence information is collected with each nucle DNA is sheared into fragments of approximately 300-800 otide addition step. Further description oftSMS is shown for base pairs, and the fragments are blunt ended. Oligonucle example in Lapidus et al. (U.S. Pat. No. 7,169,560), Lapidus otide adaptors are then ligated to the ends of the fragments. et al. (U.S. patent application number 2009/019 1565), Quake The adaptors serve as primers for amplification and sequenc et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No. 7,282, ing of the fragments. The fragments can be attached to a 337), Quake et al. (U.S. patent application number 2002/ Surface and is attached at a resolution Such that the fragments 0164629), and Braslaysky, et al., PNAS (USA), 100: 3960 are individually resolvable. Addition of one or more nucle 3964 (2003), the contents of each of these references is otides releases a proton (H), which signal detected and incorporated by reference herein in its entirety. recorded in a sequencing instrument. The signal strength is 0134. Another example of a DNA sequencing technique proportional to the number of nucleotides incorporated. that can be used in the methods of the provided invention is 0.137 Another example of a sequencing technology that 454 sequencing (Roche) (Margulies, Metal. 2005, Nature, can be used in the methods of the provided invention is 437, 376-380). 454 sequencing involves two steps. In the first Illumina sequencing. Illumina sequencing is based on the step, DNA is sheared into fragments of approximately 300 amplification of DNA on a solid surface using fold-back PCR 800 base pairs, and the fragments are blunt ended. Oligo and anchored primers. Genomic DNA is fragmented, and nucleotide adaptors are then ligated to the ends of the frag adapters are added to the 5' and 3' ends of the fragments. DNA ments. The adaptors serve as primers for amplification and fragments that are attached to the surface offlow cell channels sequencing of the fragments. The fragments can be attached are extended and bridge amplified. The fragments become to DNA capture beads, e.g., Streptavidin-coated beads using, double stranded, and the double stranded molecules are dena e.g., Adaptor B, which contains 5'-biotin tag. The fragments tured. Multiple cycles of the solid-phase amplification fol attached to the beads are PCR amplified within droplets of an lowed by denaturation can create several million clusters of oil-water emulsion. The result is multiple copies of clonally approximately 1,000 copies of single-stranded DNA mol amplified DNA fragments on each bead. In the second step, ecules of the same template in each channel of the flow cell. US 2015/O 142331 A1 May 21, 2015 37

Primers, DNA polymerase and four fluorophore-labeled, electron microscope. These molecules are then stretched on a reversibly terminating nucleotides are used to perform flat surface and imaged using an electron microscope to mea sequential sequencing. After nucleotide incorporation, a laser Sure Sequences. is used to excite the fluorophores, and an image is captured 0142. If the nucleic acid from the sample is degraded or and the identity of the first base is recorded. The 3' terminators only a minimal amount of nucleic acid can be obtained from and fluorophores from each incorporated base are removed the sample, PCR can be performed on the nucleic acid in order and the incorporation, detection and identification steps are to obtain a Sufficient amount of nucleic acid for sequencing repeated. (See e.g., Mullis etal. U.S. Pat. No. 4,683,195, the contents of which are incorporated by reference herein in its entirety). 0138 Another example of a sequencing technology that 0.143 Methods of detecting levels of gene products (e.g., can be used in the methods of the provided invention includes RNA or protein) are known in the art. Commonly used meth the single molecule, real-time (SMRT) technology of Pacific ods known in the art for the quantification of mRNA expres Biosciences. In SMRT, each of the four DNA bases is sion in a sample include northern blotting and in situ hybrid attached to one of four different fluorescent dyes. These dyes ization (Parker & Barnes, Methods in Molecular Biology are phospholinked. A single DNA polymerase is immobilized 106:247 283 (1999), the contents of which are incorporated with a single molecule oftemplate single stranded DNA at the by reference herein in their entirety); RNAse protection bottom of a Zero-mode waveguide (ZMW). A ZMW is a assays (Hod, Biotechniques 13:852 854 (1992), the contents confinement structure which enables observation of incorpo of which are incorporated by reference herein in their ration of a single nucleotide by DNA polymerase against the entirety); and PCR-based methods, such as reverse transcrip background of fluorescent nucleotides that rapidly diffuse in tion polymerase chain reaction (RT-PCR) (Weis et al., Trends an out of the ZMW (in microseconds). It takes several milli in Genetics 8:263. 264 (1992), the contents of which are seconds to incorporate a nucleotide into a growing strand. incorporated by reference herein in their entirety). Alterna During this time, the fluorescent label is excited and produces tively, antibodies may be employed that can recognize spe a fluorescent signal, and the fluorescent tag is cleaved off. cific duplexes, including RNA duplexes, DNA-RNA hybrid Detection of the corresponding fluorescence of the dye indi duplexes, or DNA-protein duplexes. Other methods known in cates which base was incorporated. The process is repeated. the art for measuring gene expression (e.g., RNA or protein 0.139. Another example of a sequencing technique that can amounts) are shown in Yeatman et al. (U.S. patent application be used in the methods of the provided invention is nanopore number 2006/0195269), the content of which is hereby incor sequencing (Soni G V and Meller A. (2007) Clin Chem 53: porated by reference in its entirety. 1996-2001). A nanopore is a small hole, of the order of 1 0144. A differentially expressed gene or differential gene nanometer in diameter. Immersion of a nanopore in a con expression refer to a gene whose expression is activated to a ducting fluid and application of a potential across it results in higher or lower level in a subject suffering from a disorder, a slight electrical current due to conduction of ions through Such as infertility, relative to its expression in a normal or the nanopore. The amount of current which flows is sensitive control Subject. The terms also include genes whose expres to the size of the nanopore. As a DNA molecule passes sion is activated to a higher or lower level at different stages of through a nanopore, each nucleotide on the DNA molecule the same disorder. It is also understood that a differentially obstructs the nanopore to a different degree. Thus, the change expressed gene may be either activated or inhibited at the in the current passing through the nanopore as the DNA nucleic acid level or protein level, or may be subject to alter molecule passes through the nanopore represents a reading of native splicing to result in a different polypeptide product. the DNA sequence. Such differences may be evidenced by a change in mRNA levels, Surface expression, secretion or other partitioning of a 0140 Another example of a sequencing technique that can polypeptide, for example. be used in the methods of the provided invention involves 0145. Differential gene expression may include a com using a chemical-sensitive field effect transistor (chemFET) parison of expression between two or more genes or their array to sequence DNA (for example, as described in US gene products, or a comparison of the ratios of the expression Patent Application Publication No. 20090026082). In one between two or more genes or their gene products, or even a example of the technique, DNA molecules can be placed into comparison of two differently processed products of the same reaction chambers, and the template molecules can be hybrid gene, which differ between normal subjects and subjects suf ized to a sequencing primer bound to a polymerase. Incorpo fering from a disorder, such as infertility, or between various ration of one or more triphosphates into a new nucleic acid stages of the same disorder. Differential expression includes Strand at the 3' end of the sequencing primer can be detected both quantitative, as well as qualitative, differences in the by a change in current by a chemFET. An array can have temporal or cellular expression pattern in a gene or its expres multiple chemFET sensors. In another example, single sion products. Differential gene expression (increases and nucleic acids can be attached to beads, and the nucleic acids decreases in expression) is based upon percent or fold can be amplified on the bead, and the individual beads can be changes over expression in normal cells. Increases may be of transferred to individual reaction chambers on a chemFET 1, 5, 10, 20, 30, 40, 50, 60, 70, 80,90, 100,120, 140, 160, 180, array, with each chamber having a chemFET sensor, and the or 200% relative to expression levels in normal cells. Alter nucleic acids can be sequenced. natively, fold increases may be of 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 0141 Another example of a sequencing technique that can 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or 10 fold over expression be used in the methods of the provided invention involves levels in normal cells. Decreases may be of 1, 5, 10, 20,30,40. using a electron microscope (Moudrianakis E.N. and Beer M. 50, 55, 60, 65,70, 75, 80, 82,84, 86, 88,90, 92,94, 96, 98, 99 Proc Natl Acad Sci USA. 1965 March; 53:564-71). In one or 100% relative to expression levels in normal cells. example of the technique, individual DNA molecules are 0146 In certain embodiments, reverse transcriptase PCR labeled using metallic labels that are distinguishable using an (RT-PCR) is used to measure gene expression. RT-PCR is a US 2015/O 142331 A1 May 21, 2015 quantitative method that can be used to compare mRNA detection of the unquenched reporter dye provides the basis levels in different sample populations to characterize patterns for quantitative interpretation of the data. of gene expression, to discriminate between closely related 0151 TaqMan(R) RT-PCR can be performed using com mRNAs, and to analyze RNA structure. mercially available equipment, such as, for example, ABI 0147 The first step is the isolation of mRNA from a target PRISM 7700 Sequence Detection SystemTM (Perkin-Elmer sample. The starting material is typically total RNA isolated Applied Biosystems, Foster City, Calif., USA), or Lightcy from human tissues or fluids. cler (Roche Molecular Biochemicals, Mannheim, Germany). In certain embodiments, the 5' nuclease procedure is run on a 0148 General methods for mRNA extraction are well real-time quantitative PCR device such as the ABI PRISM known in the art and are disclosed in standard textbooks of 7700TM Sequence Detection SystemTM. The system consists molecular biology, including Ausubel et al., Current Proto of a thermocycler, laser, charge-coupled device (CCD), cam cols of Molecular Biology, John Wiley and Sons (1997). era and computer. The system amplifies samples in a 96-well Methods for RNA extraction from paraffin embedded tissues format on a thermocycler. During amplification, laser-in are disclosed, for example, in Rupp and Locker, Lab Invest. duced fluorescent signal is collected in real-time through fiber 56:A67 (1987), and De Andres et al., BioTechniques optics cables for all 96 wells, and detected at the CCD. The 18:42044 (1995). The contents of each of theses references is system includes software for running the instrument and for incorporated by reference herein in their entirety. In particu analyzing the data. lar, RNA isolation can be performed using a purification kit, 0152 5'-Nuclease assay data are initially expressed as Ct, buffer set and protease from commercial manufacturers. Such or the threshold cycle. As discussed above, fluorescence val as Qiagen, according to the manufacturers instructions. For ues are recorded during every cycle and represent the amount example, total RNA from cells in culture can be isolated using of product amplified to that point in the amplification reac Qiagen RNeasy mini-columns. Other commercially available tion. The point when the fluorescent signal is first recorded as RNA isolation kits include MASTERPURE Complete DNA statistically significant is the threshold cycle (C). and RNA Purification Kit (EPICENTRE, Madison, Wis.), 0153. To minimize errors and the effect of sample-to and Paraffin Block RNA Isolation Kit (Ambion, Inc.). Total sample variation, RT-PCR is usually performed using an RNA from tissue samples can be isolated using RNA Stat-60 internal standard. The ideal internal standard is expressed at a (Tel-Test). RNA prepared from tumor can be isolated, for constant level among different tissues, and is unaffected by example, by cesium chloride density gradient centrifugation. the experimental treatment. RNAs most frequently used to 0149. The first step in gene expression profiling by RT normalize patterns of gene expression are mRNAs for the PCR is the reverse transcription of the RNA template into housekeeping genes glyceraldehyde-3-phosphate-dehydro cDNA, followed by its exponential amplification in a PCR genase (GAPDH) and 3-actin. For performing analysis on reaction. The two most commonly used reverse transcriptases pre-implantation embryos and oocytes, Chuk is a gene that is are avilo myeloblastosis virus reverse transcriptase (AMV used for normalization. RT) and Moloney murine leukemia virus reverse tran 0154) A more recent variation of the RT-PCR technique is scriptase (MMLV-RT). The reverse transcription step is typi the real time quantitative PCR, which measures PCR product cally primed using specific primers, random hexamers, or accumulation through a dual-labeled fluorigenic probe (i.e., oligo-dT primers, depending on the circumstances and the TaqMan(R) probe). Real time PCR is compatible both with goal of expression profiling. For example, extracted RNA can quantitative competitive PCR, in which internal competitor be reverse-transcribed using a GeneAmp RNA PCR kit (Per for each target sequence is used for normalization, and with kin Elmer, Calif., USA), following the manufacturers quantitative comparative PCR using a normalization gene instructions. The derived cDNA can then be used as a tem contained within the sample, or a housekeeping gene for plate in the subsequent PCR reaction. RT-PCR. For further details see, e.g. Held et al., Genome 0150. Although the PCR step can use a variety of thermo Research 6:98.6994 (1996), the contents of which are incor stable DNA-dependent DNA polymerases, it typically porated by reference herein in their entirety. employs the Taq DNA polymerase, which has a 5'-3' nuclease 0.155. In another embodiment, a Mass ARRAY-based gene activity but lacks a 3'-5' proofreading endonuclease activity. expression profiling method is used to measure gene expres Thus, TaqMan(R) PCR typically utilizes the 5'-nuclease activ sion. In the Mass ARRAY-based gene expression profiling ity of Taq polymerase to hydrolyze a hybridization probe method, developed by Sequenom, Inc. (San Diego, Calif.) bound to its target amplicon, but any enzyme with equivalent following the isolation of RNA and reverse transcription, the 5' nuclease activity can be used. Two oligonucleotide primers obtained cDNA is spiked with a synthetic DNA molecule are used to generate an amplicon typical of a PCR reaction. A (competitor), which matches the targeted cDNA region in all third oligonucleotide, or probe, is designed to detect nucle positions, except a single base, and serves as an internal otide sequence located between the two PCR primers. The standard. The cDNA/competitor mixture is PCR amplified probe is non-extendible by Taq DNA polymerase enzyme, and is subjected to a post-PCR shrimp alkaline phosphatase and is labeled with a reporter fluorescent dye and a quencher (SAP) enzyme treatment, which results in the dephosphory fluorescent dye. Any laser-induced emission from the lation of the remaining nucleotides. After inactivation of the reporter dye is quenched by the quenching dye when the two alkaline phosphatase, the PCR products from the competitor dyes are located close together as they are on the probe. and cDNA are Subjected to primer extension, which generates During the amplification reaction, the Taq DNA polymerase distinct mass signals for the competitor- and cDNA-derives enzyme cleaves the probe in a template-dependent manner. PCR products. After purification, these products are dis The resultant probe fragments disassociate in solution, and pensed on a chip array, which is pre-loaded with components signal from the released reporter dye is free from the quench needed for analysis with matrix-assisted laser desorption ion ing effect of the second fluorophore. One molecule of reporter ization time-of-flight mass spectrometry (MALDI-TOFMS) dye is liberated for each new molecule synthesized, and analysis. The cDNA present in the reaction is then quantified US 2015/O 142331 A1 May 21, 2015 39 by analyzing the ratios of the peak areas in the mass spectrum comprise immobilized, preferably monoclonal, antibodies generated. For further details see, e.g. Ding and Cantor, Proc. specific to a plurality of protein species encoded by the cell Natl. Acad. Sci. USA 100:3059 3064 (2003). genome. Preferably, antibodies are present for a substantial 0156 Further PCR-based techniques include, for fraction of the proteins of interest. Methods for making example, differential display (Liang and Pardee, Science 257: monoclonal antibodies are well known (see, e.g., Harlow and 967 971 (1992)); amplified fragment length polymorphism Lane, 1988, ANTIBODIES: A LABORATORY MANUAL, (iAFLP) (Kawamoto et al., Genome Res. 12:1305 1312 Cold Spring Harbor, N.Y., which is incorporated in its (1999)); BeadArrayTM technology (Illumina, San Diego, entirety for all purposes). In one embodiment, monoclonal Calif.; Oliphant et al., Discovery of Markers for Disease antibodies are raised against synthetic peptide fragments (Supplement to Biotechniques), June 2002; Ferguson et al., designed based on genomic sequence of the cell. With Such an Analytical Chemistry 72:5618 (2000)); Beads Array for antibody array, proteins from the cell are contacted to the Detection of Gene Expression (BADGE), using the commer array, and their binding is assayed with assays known in the cially available Luminex 100 LabMAP system and multiple art. Generally, the expression, and the level of expression, of color-coded microspheres (LumineX Corp., Austin,Tex.) in a proteins of diagnostic or prognostic interest can be detected rapid assay for gene expression (Yang et al., Genome Res. through immunohistochemical staining of tissue slices or 11:1888 1898 (2001)); and high coverage expression profil sections. ing (HiCEP) analysis (Fukumura et al., Nucl. Acids. Res. 0160 Finally, levels of transcripts of marker genes in a 31 (16) e94 (2003)). The contents of each of which are incor number of tissue specimens may be characterized using a porated by reference herein in their entirety. “tissue array” (Kononen et al., Nat. Med. 4(7):844-7 (1998)). 0157. In certain embodiments, differential gene expres In a tissue array, multiple tissue samples are assessed on the sion can also be identified, or confirmed using a microarray same microarray. The arrays allow in situ detection of RNA technique. In this method, polynucleotide sequences of inter and protein levels; consecutive sections allow the analysis of est (including cDNAS and oligonucleotides) are plated, or multiple samples simultaneously. arrayed, on a microchip Substrate. The arrayed sequences are 0.161. In other embodiments, Serial Analysis of Gene then hybridized with specific DNA probes from cells or tis Expression (SAGE) is used to measure gene expression. Sues of interest. Methods for making microarrays and deter Serial analysis of gene expression (SAGE) is a method that mining gene product expression (e.g., RNA or protein) are allows the simultaneous and quantitative analysis of a large shown in Yeatman et al. (U.S. patent application number number of gene transcripts, without the need of providing an 2006/0195269), the content of which is incorporated by ref individual hybridization probe for each transcript. First, a erence herein in its entirety. short sequence tag (about 10-14 bp) is generated that contains 0158. In a specific embodiment of the microarray tech Sufficient information to uniquely identify a transcript, pro nique, PCR amplified inserts of cDNA clones are applied to a vided that the tag is obtained from a unique position within substrate in a dense array, for example, at least 10,000 nucle each transcript. Then, many transcripts are linked together to otide sequences are applied to the Substrate. The microar form long serial molecules, that can be sequenced, revealing rayed genes, immobilized on the microchip at 10,000 ele the identity of the multiple tags simultaneously. The expres ments each, are suitable for hybridization under stringent sion pattern of any population of transcripts can be quantita conditions. Fluorescently labeled cDNA probes may be gen tively evaluated by determining the abundance of individual erated through incorporation of fluorescent nucleotides by tags, and identifying the gene corresponding to each tag. For reverse transcription of RNA extracted from tissues of inter more details see, e.g. Velculescu et al., Science 270:484 487 est. Labeled cDNA probes applied to the chip hybridize with (1995); and Velculescu et al., Cell 88:24351 (1997, the con specificity to each spot of DNA on the array. After stringent tents of each of which are incorporated by reference herein in washing to remove non-specifically bound probes, the chip is their entirety). scanned by confocal laser microscopy or by another detection 0162. In other embodiments Massively Parallel Signature method, such as a CCD camera. Quantitation of hybridization Sequencing (MPSS) is used to measure gene expression. This of each arrayed element allows for assessment of correspond method, described by Brenner et al., Nature Biotechnology ing mRNA abundance. With dual color fluorescence, sepa 18:630 634 (2000), is a sequencing approach that combines rately labeled cDNA probes generated from two sources of non-gel-based signature sequencing with in vitro cloning of RNA are hybridized pair-wise to the array. The relative abun millions oftemplates on separate 5 um diameter microbeads. dance of the transcripts from the two sources corresponding First, a microbead library of DNA templates is constructed by to each specified gene is thus determined simultaneously. The in vitro cloning. This is followed by the assembly of a planar miniaturized scale of the hybridization affords a convenient array of the template-containing microbeads in a flow cell at and rapid evaluation of the expression pattern for large num a high density (typically greater than 3x10' microbeads/cm). bers of genes. Such methods have been shown to have the The free ends of the cloned templates on each microbead are sensitivity required to detect rare transcripts, which are analyzed simultaneously, using a fluorescence-based signa expressed at a few copies per cell, and to reproducibly detect ture sequencing method that does not require DNA fragment at least approximately two-fold differences in the expression separation. This method has been shown to simultaneously levels (Schena et al., Proc. Natl. Acad. Sci. USA 93(2):106 and accurately provide, in a single operation, hundreds of 149 (1996), the contents of which are incorporated by refer thousands of gene signature sequences from a yeast cDNA ence herein in their entirety). Microarray analysis can be library. performed by commercially available equipment, following 0163. Immunohistochemistry methods are also suitable manufacturer's protocols, such as by using the Affymetrix for detecting the expression levels of the gene products of the GenChip technology, or Incyte's microarray technology. present invention. Thus, antibodies (monoclonal or poly 0159. Alternatively, protein levels can be determined by clonal) or antisera, Such as polyclonal antisera, specific for constructing an antibody microarray in which binding sites each marker are used to detect expression. The antibodies can US 2015/O 142331 A1 May 21, 2015 40 be detected by direct labeling of the antibodies themselves, methods (e.g., immunoassays or RNA measuring assays) to for example, with radioactive labels, fluorescent labels, hap determine the presence and/or quantity of the one or more ten labels such as, biotin, or an enzyme Such as horse radish biomarkers disclosed herein in a biological sample. In some peroxidase or alkaline phosphatase. Alternatively, unlabeled embodiments, the MS analysis includes matrix-assisted laser primary antibody is used in conjunction with a labeled sec desorption/ionization (MALDI) time-of-flight (TOF) MS ondary antibody, comprising antisera, polyclonal antisera or a analysis, such as for example direct-spot MALDI-TOF or monoclonal antibody specific for the primary antibody. liquid chromatography MALDI-TOF mass spectrometry Immunohistochemistry protocols and kits are well known in analysis. In some embodiments, the MS analysis comprises the art and are commercially available. 0164. In certain embodiments, a proteomics approach is electrospray ionization (ESI) MS, such as for example liquid used to measure gene expression. A proteome refers to the chromatography (LC) ESI-MS. Mass analysis can be accom totality of the proteins present in a sample (e.g. tissue, organ plished using commercially-available spectrometers. Meth ism, or cell culture) at a certain point of time. Proteomics ods for utilizing MS analysis, including MALDI-TOF MS includes, among other things, study of the global changes of and ESI-MS, to detect the presence and quantity of biomarker protein expression in a sample (also referred to as expression peptides in biological samples are known in the art. See for proteomics). Proteomics typically includes the following example U.S. Pat. Nos. 6,925,389; 6,989,100; and 6,890,763 steps: (1) separation of individual proteins in a sample by 2-D for further guidance, each of which is incorporated by refer gel electrophoresis (2-D PAGE); (2) identification of the indi ence herein in their entirety. vidual proteins recovered from the gel, e.g. my mass spec trometry or N-terminal sequencing, and (3) analysis of the Phenotypic Traits data using bioinformatics. Proteomics methods are valuable Supplements to other methods of gene expression profiling, (0166 In certain embodiments, methods of the invention and can be used, alone or in combination with other methods, assess risk of female infertility by correlating assay results to detect the products of the prognostic markers of the present with an analysis of a phenotypic trait or environmental expo invention. sure that may be associated with infertility. Exemplary phe 0.165. In some embodiments, mass spectrometry (MS) notypic traits or environmental exposures are shown in Table analysis can be used alone or in combination with other 8. TABLE 8 Phenotypic and environmental variables impacting fertility Success Cholesterol levels on different days of the menstrual cycle Age of first menses for patient and female blood relatives (e.g. sisters, mother, grandmothers) Age of menopause for female blood relatives (e.g. sisters, mother, grandmothers) Number of previous pregnancies (biochemical/ectopic? clinical fetal heartbeat detected, live birth outcomes), age at the time, and outcome for patient and female blood relatives (e.g. sisters, mother, grandmothers) Diagnosis of Polycystic Ovarian Syndrome History of hydrosalpinx or tubal occlusion History of endometriosis, pelvic pain, or painful periods Cancer history type of cancer treatment outcome for patient and female blood relatives (e.g. sisters, mother, grandmothers) Age that sexual activity began, current level of sexual activity Smoking history for patient and blood relatives Travel schedule/number of flying hours a year.time difference changes of more than 3 hours (Jetlag and Flight-associated Radiation Exposure) Nature of periods (length of menses, length of cycle) Biological age (number of years since first menses) Birth control use Drug use (illegal or legal) Body mass index (current, lowest ever, highest ever) History of polyps History of hormonal imbalance History of amenorrhoea History of eating disorders Alcohol consumption by patient or blood relatives Details of mother's pregnancy with patient (i.e. measures of uterine environment): any drugs aken, Smoking, alcohol, stress levels, exposure to plastics (i.e. Tupperware), composition of diet (see below) Sleep patterns: number of hours a night, continuous overall Diet: meat, organic produce, vegetables, vitamin or other Supplement consumption, dairy (full fat or reduced fat), coffee tea consumption, folic acid, Sugar (complex, artificial, simple), processed ood versus home cooked. Exposure to plastics: microwave in plastic, cook with plastic, store food in plastic, plastic water or coffee mugs. Water consumption: amount per day, format: Straight from the tap, bottled water (plastic or bottle), filtered (type: e.g. Britta Pur) Residence history starting with mother's pregnancy: location duration Environmental exposure to potential toxins for different regions (extracted from government monitoring databases) Health metrics: autoimmune disease, chronic illness condition US 2015/O 142331 A1 May 21, 2015 41

TABLE 8-continued Phenotypic and environmental variables impacting fertility Success Pelvic Surgery history Life time number of pelvic X-rays History of sexually transmitted infections: type?treatment? outcome Reproductive hormone levels: follicle stimulating hormone, anti-Millerian hormone, estrogen, progesterone Stress Thickness and type of endometrium throughout the menstrual cycle. Age Height Fertility treatment history and details: history of hormone stimulation, brand of drugs used, basal antral follicle count, follicle count after stimulation with different protocols, number? quality stage of retrieved oocytes development profile of embryos resulting from in vitro insemination (natural or ICSI), details of IVF procedure (which clinic, doctoriembryologist at clinic, assisted hatching, fresh or thawed oocytes embryos, embryo transfer (blood on the catheterisquirt detection and direction on ultrasound), number of Successful and unsuccessful IVF attempts Morning sickness during pregnancy Breast size before? during? after pregnancy History of ovarian cysts Twin or sibling from multiple birth (mono-Zygotic or di-Zygotic) Male factor infertility for reproductive partner: Semen analysis (count, motility, morphology), Vasectomy, male cancer, Smoking, alcohol, diet, STIs Blood type DES exposure in utero Past and current exercise? athletic history Levels of phthalates, including metabolites: MEP-monoethyl phthalate, MECPP-mono(2-ethyl-5-carboxypentyl) phthalate, MEHHP mono(2-ethyl-5-hydroxyhexyl) phthalate, MEOHP-mono(2-ethyl-5-ox-ohexyl) phthalate, MBP monobutyl phthalate, MBZP-monobenzyl phthalate, MEHP-mono(2-ethylhexyl) phthalate, MiBP-mono-isobutyl phthalate, MCPP-mono(3-carboxypropyl) phthalate, MCOP monocarboxyisooctyl phthalate, MCNP-monocarboxyisononyl phthalate Familial history of Premature Ovarian Failure/Insufficiency Autoimmunity history-Antiadrenal antibodies (anti-21-hydroxylase antibodies), antiovarian antibodies, antithyroidanitibodies (anti-thyroid peroxidase, antithyroglobulin) Hormone levels: Leutenizing hormone (using immunofluorometric assay), A4-Androstenedione (using radioimmunoassay), Dehydroepiandrosterone (using radioimmunoassay), and Inhibin B (commercial ELISA) Number of years trying to conceive Dioxin and PVC exposure Hair color Nevi (moles) Lead, cadmium, and other heavy metal exposure

0167 Information regarding the fertility-associated phe 0.168. In other embodiments, an assay specific to an envi notypic traits of the female, such as those listed in Table 8, can ronmental exposure is used to obtain the phenotypic trait of be obtained by any means known in the art. In many cases, interest. Such assays are known to those of skill in the art, and Such information can be obtained from a questionnaire com may be used with methods of the invention. For example, the pleted by the Subject that contains questions regarding certain hormones used in birth control pills (estrogen and progester fertility-associated phenotypic traits. Additional information one) may be detected from a urine or blood test. Venners et al. can be obtained from a questionnaire completed by the Sub (Hum. Reprod. 21 (9): 2272-2280, 2006) reports assays for jects partner and blood relatives. The questionnaire includes detecting estrogen and progesterone in urine and blood questions regarding the Subject's fertility-associated pheno samples. Venner also reports assays for detecting the chemi typic traits, such as her age, Smoking habits, or frequency of cals used in fertility treatments. alcohol consumption. Information can also be obtained from 0169. Similarly, illicit drug use may be detected from a the medical history of the subject, as well as the medical tissue or body fluid, Such as hair, urine Sweat, or blood, and history of blood relatives and other family members. Addi there are numerous commercially available assays (LabCorp) tional information can be obtained from the medical history for conducting Such tests. Standard drug tests look for ten and family medical history of the subject’s partner. Medical different classes of drugs, and the test is commercially known history information can be obtained through analysis of elec as a “10-panel urine screen”. The 10-panel urine screen con tronic medical records, paper medical records, a series of sists of the following: 1. Amphetamines (including Metham questions about medical history included in the question phetamine) 2. Barbiturates 3. Benzodiazepines 4. Cannab naire, and a combination thereof. In other cases, the informa inoids (THC). Cocaine 6. Methadone 7. Methaqualone 8. tion can be obtained by analyzing a sample collected from the Opiates (Codeine, Morphine, Heroin, Oxycodone, Vicodin, female subject, reproductive partners of the subject, blood etc.) 9. Phencyclidine (PCP) 10. Propoxyphene. Use of alco relatives of the subject, and a combination thereof. The hol can also be detected by such tests. sample may include human tissue or bodily fluid. Any of the 0170 Numerous assays can be used to tests a patients assays described herein may be used to obtain the phenotypic exposure to plastics (e.g., Bisphenol A (BPA)). BPA is most trait. commonly found as a component of polycarbonates (about US 2015/O 142331 A1 May 21, 2015 42

74% of total BPA produced) and in the production of epoxy 0176 A different approach called the generalized linear resins (about 20%). As well as being found in a myriad of model, expresses the outcome as a weighted Sum of functions products including plastic food and beverage contains (in of the predictor variables. The weights are calculated based cluding baby and water bottles), BPA is also commonly found on least squares or Bayesian methods to minimize the predic in various household appliances, electronics, sports safety tion error on the training set. A predictors weight reveals the equipment, adhesives, cash register receipts, medical devices, effect of changing that predictor, while holding the others eyeglass lenses, water Supply pipes, and many other products. constant, on the outcome. In cases where one or more predic Assays for testing blood, sweat, or urine for presence of BPA tors are highly correlated, in a phenomenon known as col are described, for example, in Genuis et al. (Journal of Envi linearity, the relative values of their weights are less mean ronmental and Public Health, Volume 2012, Article ID ingful; steps must be taken to remove that collinearity, such as 185731, 10 pages, 2012). by excluding the nearly redundant variables from the model. 0171 Association studies can be performed to analyze the Thus, when properly interpreted, the weights express the effect of genetic mutations or abnormal gene expression on a relative importance of the predictors. Less general formula particular trait being studied. Infertility as a trait may be tions of the generalized linear model include linear regres analyzed as a non-continuous variable in a case-control study Sion, multiple regression, and multifactor logistic regression that includes as the patients infertile females and as controls models, and are highly used in the medical community as fertile females that are age and ethnically matched. Methods clinical predictors. including logistic regression analysis and Chi square tests may be used to identify an association between genetic muta Microarrays tions or abnormal gene expression and infertility. In addition, 0177. In certain aspects, the invention provides a microar when using logistic regression, adjustments for covariates ray including a plurality of oligonucleotides attached to a like age, smoking, BMI and other factors that effect infertility, Substrate at discrete addressable positions, in which at least such as those shown in Table 4, may be included in the one of the oligonucleotides hybridizes to a portion of a analysis. genetic region from Table 1 that includes an infertility-asso 0172. In addition, haplotype effects can be estimated using ciated mutation. programs such as Haploscore. Alternatively, programs such 0.178 Methods of constructing microarrays are known in as Haploview and Phase can be used to estimate haplotype the art. See for example Yeatman et al. (U.S. patent applica frequencies and then further analysis Such as Chi square test tion number 2006/0195269), the content of which is hereby can be performed. Logistic regression analysis may be used to incorporated by reference in its entirety. generate an odds ratio and relative risk for each genetic vari 0179 Microarrays are prepared by selecting probes that ant or variants. include a polynucleotide sequence, and then immobilizing 0173 The association between genetic mutations and/or Such probes to a solid Support or Surface. For example, the abnormal gene expression and infertility may be analyzed probes may comprise DNA sequences, RNA sequences, or within cases only or comparing cases and controls using copolymer sequences of DNA and RNA. The polynucleotide analysis of variance. Such analysis may include, adjustments sequences of the probes may also comprise DNA and/or RNA for covariates like age, Smoking, BMI and other factors that analogues, or combinations thereof. For example, the poly effect infertility. In addition, haplotype effects can be esti nucleotide sequences of the probes may be full or partial mated using programs such as Haploscore. fragments of genomic DNA. The polynucleotide sequences 0.174 Method of logistic regression are described, for of the probes may also be synthesized nucleotide sequences, example in, Ruczinski (Journal of Computational and Such as synthetic oligonucleotide sequences. The probe Graphical Statistics 12:475-512, 2003); Agresti (An Intro sequences can be synthesized either enzymatically in vivo, duction to Categorical Data Analysis, John Wiley & Sons, enzymatically in vitro (e.g., by PCR), or non-enzymatically Inc., 1996, New York, Chapter 8); and Yeatman et al. (U.S. in vitro. patent application number 2006/0195269), the content of 0180. The probe or probes used in the methods of the each of which is hereby incorporated by reference in its invention are preferably immobilized to a solid support, entirety. which may be either porous or non-porous. For example, the 0175 Other algorithms for analyzing associations are probes of the invention may be polynucleotide sequences, known. For example, the stochastic gradient boosting is used which are attached to a nitrocellulose or nylon membrane or to generate multiple additive regression tree (MART) models filter covalently at either the 3' or the 5' end of the polynucle to predict a range of outcome probabilities. Each tree is a otide. Such hybridization probes are well known in the art recursive graph of decisions the possible consequences of (see, e.g., Sambrook et al., MOLECULAR CLONING-A which partition patient parameters; each node represents a LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold question (e.g., is the FSH level greater than X'?) and the branch Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1989). taken from that node represents the decision made (e.g. yes or Alternatively, the solid Support or Surface may be a glass or no). The choice of question corresponding to each node is plastic Surface. In a particularly preferred embodiment, automated. AMART model is the weighted sum of iteratively hybridization levels are measured to microarrays of probes produced regression trees. At each iteration, a regression tree consisting of a solid phase on the Surface of which are immo is fitted according to a criterion in which the samples more bilized a population of polynucleotides, such as a population involved in the prediction error are given priority. This tree is of DNA or DNA mimics, or, alternatively, a population of added to the existing trees, the prediction erroris recalculated, RNA or RNA mimics. The solid phase may be a nonporous or, and the cycle continues, leading to a progressive refinement optionally, a porous material Such as a gel. of the prediction. The strengths of this method include analy 0181. In preferred embodiments, a microarray comprises sis of many variables without knowledge of their complex a Support or Surface with an ordered array of binding (e.g., interactions beforehand. hybridization) sites or “probes’ each representing one of the US 2015/O 142331 A1 May 21, 2015 genes described herein, particularly the genes described in length, in the range of 80-120 nucleotides in length, and most Table 1. Preferably the microarrays are addressable arrays, preferably are 60 nucleotides in length. and more preferably positionally addressable arrays. More specifically, each probe of the array is preferably located at a 0186. The probes may comprise DNA or DNA “mimics' known, predetermined position on the Solid Support such that (e.g., derivatives and analogues) corresponding to a portion of the identity (i.e., the sequence) of each probe can be deter an organisms genome. In another embodiment, the probes of mined from its position in the array (i.e., on the Support or the microarray are complementary RNA or RNA mimics. surface). In preferred embodiments, each probe is covalently DNA mimics are polymers composed of subunits capable of attached to the Solid Support at a single site. specific, Watson-Crick-like hybridization with DNA, or of 0182 Microarrays can be made in a number of ways, of specific hybridization with RNA. The nucleic acids can be which several are described below. However produced, modified at the base moiety, at the Sugar moiety, or at the microarrays share certain characteristics. The arrays are phosphate backbone. Exemplary DNA mimics include, e.g., reproducible, allowing multiple copies of a given array to be phosphorothioates. produced and easily compared with each other. Preferably, 0187 DNA can be obtained, e.g., by polymerase chain microarrays are made from materials that are stable under reaction (PCR) amplification of genomic DNA or cloned binding (e.g., nucleic acid hybridization) conditions. The sequences. PCR primers are preferably chosen based on a microarrays are preferably small, e.g., between 1 cm and 25 known sequence of the genome that will result in amplifica cm, between 12 cm and 13 cm, or 3 cm. However, larger tion of specific fragments of genomic DNA. Computer pro arrays are also contemplated and may be preferable, e.g., for grams that are well known in the art are useful in the design of use in screening arrays. Preferably, a given binding site or primers with the required specificity and optimal amplifica unique set of binding sites in the microarray will specifically tion properties, such as Oligo version 5.0 (National Bio bind (e.g., hybridize) to the product of a single gene in a cell sciences). Typically each probe on the microarray will be (e.g., to a specific mRNA, or to a specific cDNA derived between 10 bases and 50,000 bases, usually between 300 therefrom). However, in general, other related or similar bases and 1,000 bases in length. PCR methods are well known sequences will cross hybridize to a given binding site. in the art, and are described, for example, in Innis et al., eds., 0183 The microarrays of the present invention include PCR PROTOCOLS: A GUIDE TO METHODS AND one or more test probes, each of which has a polynucleotide APPLICATIONS, Academic Press Inc., San Diego, Calif. sequence that is complementary to a Subsequence of RNA or (1990). It will be apparent to one skilled in the art that con DNA to be detected. Preferably, the position of each probe on trolled robotic systems are useful for isolating and amplifying the Solid Surface is known. Indeed, the microarrays are pref nucleic acids. erably positionally addressable arrays. Specifically, each 0188 An alternative, preferred means for generating the probe of the array is preferably located at a known, predeter polynucleotide probes of the microarray is by synthesis of mined position on the Solid Support Such that the identity (i.e., synthetic polynucleotides or oligonucleotides, e.g., using the sequence) of each probe can be determined from its posi N-phosphonate or phosphoramidite chemistries (Froehler et tion on the array (i.e., on the Support or Surface). al., Nucleic Acid Res. 14:5399-5407 (1986): McBride et al., 0184. According to the invention, the microarray is an Tetrahedron Lett. 24:246-248 (1983)). Synthetic sequences array (i.e., a matrix) in which each position represents one of are typically between about 10 and about 500 bases in length, the biomarkers described herein. For example, each position more typically between about 20 and about 100 bases, and can contain a DNA or DNA analogue based on genomic DNA most preferably between about 40 and about 70 bases in to which a particular RNA or cDNA transcribed from that length. In some embodiments, synthetic nucleic acids include genetic marker can specifically hybridize. The DNA or DNA non-natural bases, such as, but by no means limited to, analogue can be, e.g., a synthetic oligomer or a gene frag inosine. As noted above, nucleic acid analogues may be used ment. In one embodiment, probes representing each of the as binding sites for hybridization. An example of a Suitable markers is present on the array. In a preferred embodiment, nucleic acid analogue is peptide nucleic acid (see, e.g., the array comprises probes for each of the genes listed in Egholm et al., Nature 363:566-568 (1993); U.S. Pat. No. Table 1. 5,539,083). 0185. As noted above, the probe to which a particular polynucleotide molecule specifically hybridizes according to 0189 Probes are preferably selected using an algorithm the invention contains a complementary genomic polynucle that takes into account binding energies, base composition, otide sequence. The probes of the microarray preferably con sequence complexity, cross-hybridization binding energies, sist of nucleotide sequences of no more than 1,000 nucle and secondary structure. See Friend et al., International otides. In some embodiments, the probes of the array consist Patent Publication WO 01/05935, published Jan. 25, 2001; of nucleotide sequences of 10 to 1,000 nucleotides. In a Hughes et al., Nat. Biotech. 19:342-7 (2001). preferred embodiment, the nucleotide sequences of the 0190. A skilled artisan will also appreciate that positive probes are in the range of 10-200 nucleotides in length and are control probes, e.g., probes known to be complementary and genomic sequences of a species of organism, such that a hybridizable to sequences in the target polynucleotide mol plurality of different probes is present, with sequences ecules, and negative control probes, e.g., probes known to not complementary and thus capable of hybridizing to the be complementary and hybridizable to sequences in the target genome of Such a species of organism, sequentially tiled polynucleotide molecules, should be included on the array. In across all or a portion of Such genome. In other specific one embodiment, positive controls are synthesized along the embodiments, the probes are in the range of 10-30 nucle perimeter of the array. In another embodiment, positive con otides in length, in the range of 10-40 nucleotides in length, in trols are synthesized in diagonal stripes across the array. In the range of 20-50 nucleotides in length, in the range of 40-80 still another embodiment, the reverse complement for each nucleotides in length, in the range of 50-150 nucleotides in probe is synthesized next to the position of the probe to serve US 2015/O 142331 A1 May 21, 2015 44 as a negative control. In yet another embodiment, sequences preferably having a density of at least about 2,500 different from other species of organism are used as negative controls probes per 1 cm. Sup.2. The polynucleotide probes are or as “spike-in' controls. attached to the support covalently at either the 3' or the 5' end 0191 The probes are attached to a solid support or surface, of the polynucleotide. which may be made, e.g., from glass, plastic (e.g., polypro 0196. The polynucleotide molecules which may be ana pylene, nylon), polyacrylamide, nitrocellulose, gel, or other lyzed by the present invention are DNA, RNA, or protein. The porous or nonporous material. A preferred method for attach target polynucleotides are detectably labeled at one or more ing the nucleic acids to a Surface is by printing on glass plates, nucleotides. Any method known in the art may be used to as is described generally by Schena et al. Science 270:467 detectably label the target polynucleotides. Preferably, this 470 (1995). This method is especially useful for preparing labeling incorporates the label uniformly along the length of microarrays of cDNA (See also, DeRisi etal, Nature Genetics the DNA or RNA, and more preferably, the labeling is carried 14:457-460 (1996); Shalon et al., Genome Res. 6:639-645 out at a high degree of efficiency. (1996); and Schena et al., Proc. Natl. Acad. Sci. U.S.A. 0.197 In a preferred embodiment, the detectable label is a 93:10539-11286 (1995)). luminescent label. For example, fluorescent labels, biolumi 0.192 A second preferred method for making microarrays nescent labels, chemiluminescent labels, and colorimetric is by making high-density oligonucleotide arrays. Techniques labels may be used in the present invention. In a highly are known for producing arrays containing thousands of oli preferred embodiment, the label is a fluorescent label, such as gonucleotides complementary to defined sequences, at a fluorescein, a phosphor, a rhodamine, or a polymethine dye defined locations on a Surface using photolithographic tech derivative. Examples of commercially available fluorescent niques for synthesis in situ (see, Fodor et al., 1991, Science labels include, for example, fluorescent phosphoramidites 251:767-773: Pease et al., 1994, Proc. Natl. Acad. Sci. U.S.A. such as FluorePrime (Amersham Pharmacia, Piscataway, 91:5022-5026: Lockhart et al., 1996, Nature Biotechnology N.J.), Fluoredite (Millipore, Bedford, Mass.), FAM (ABI, 14:1675: U.S. Pat. Nos. 5,578,832; 5,556,752; and 5,510, Foster City, Calif.), and Cy3 or Cy5 (Amersham Pharmacia, 270) or other methods for rapid synthesis and deposition of Piscataway, N.J.). In another embodiment, the detectable defined oligonucleotides (Blanchard et al., Biosensors & Bio label is a radiolabeled nucleotide. electronics 11:687-690). When these methods are used, oli 0.198. In a further preferred embodiment, target poly gonucleotides (e.g., 60-mers) of known sequence are synthe nucleotide molecules from a patient sample are labeled dif sized directly on a surface Such as a derivatized glass slide. ferentially from target polynucleotide molecules of a refer Usually, the array produced is redundant, with several oligo ence sample. The reference can comprise target nucleotide molecules per RNA. polynucleotide molecules from normal tissue samples. 0193 Other methods for making microarrays, e.g., by 0199 Nucleic acid hybridization and wash conditions are masking (Maskos and Southern, 1992, Nuc. Acids. Res. chosen so that the target polynucleotide molecules specifi 20:1679-1684), may also be used. In principle, and as noted cally bind or specifically hybridize to the complementary Supra, any type of array, for example, dot blots on a nylon polynucleotide sequences of the array, preferably to a specific hybridization membrane (see Sambrook et al., MOLECU array site, wherein its complementary DNA is located. LAR CLONING A LABORATORY MANUAL (2ND 0200 Arrays containing double-stranded probe DNA situ ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring ated thereon are preferably subjected to denaturing condi Harbor, N.Y. (1989)) could be used. However, as will be tions to render the DNA single-stranded prior to contacting recognized by those skilled in the art, very small arrays will with the target polynucleotide molecules. Arrays containing frequently be preferred because hybridization volumes will single-stranded probe DNA (e.g., synthetic oligodeoxyribo be smaller. nucleic acids) may need to be denatured prior to contacting 0194 In one embodiment, the arrays of the present inven with the target polynucleotide molecules, e.g., to remove tion are prepared by Synthesizing polynucleotide probes on a hairpins or dimers which form due to self complementary Support. In Such an embodiment, polynucleotide probes are Sequences. attached to the support covalently at either the 3' or the 5' end 0201 Optimal hybridization conditions will depend on the of the polynucleotide. length (e.g., oligomer versus polynucleotide greater than 200 0.195 Inaparticularly preferred embodiment, microarrays bases) and type (e.g., RNA, or DNA) of probe and target of the invention are manufactured by means of an inkjet nucleic acids. One of skill in the art will appreciate that as the printing device for oligonucleotide synthesis, e.g., using the oligonucleotides become shorter, it may become necessary to methods and systems described by Blanchard in U.S. Pat. No. adjust their length to achieve a relatively uniform melting 6,028, 189; Blanchard et al., 1996, Biosensors and Bioelec temperature for satisfactory hybridization results. General tronics 11:687-690: Blanchard, 1998, in Synthetic DNA parameters for specific (i.e., stringent) hybridization condi Arrays in Genetic Engineering, Vol. 20, J. K. Setlow, Ed., tions for nucleic acids are described in Sambrook et al., Plenum Press, New York at pages 111-123. Specifically, the MOLECULAR CLONING ALABORATORY MANUAL oligonucleotide probes in Such microarrays are preferably (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold synthesized in arrays, e.g., on a glass slide, by serially depos Spring Harbor, N.Y. (1989), and in Ausubelet al., CURRENT iting individual nucleotide bases in “microdroplets” of a high PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current Surface tension solvent such as propylene carbonate. The Protocols Publishing, New York (1994). Typical hybridiza microdroplets have Small Volumes (e.g., 100 p or less, more tion conditions for the cDNA microarrays of Schena et al. are preferably 50LL or less) and are separated from each other on hybridization in 5xSSC plus 0.2% SDS at 65° C. for four the microarray (e.g., by hydrophobic domains) to form circu hours, followed by washes at 25°C. in low stringency wash lar surface tension wells, which define the locations of the buffer (1xSSC plus 0.2% SDS), followed by 10 minutes at array elements (i.e., the different probes). Microarrays manu 25°C. in higher stringency wash buffer (0.1xSSC plus 0.2% factured by this ink-jet method are typically of high density, SDS) (Schena et al., Proc. Natl. Acad. Sci. U.S.A. 93:10.614 US 2015/O 142331 A1 May 21, 2015

(1993)). Useful hybridization conditions are also provided in, (e.g., Example 18); and 5) characterizefidentify the genetic e.g., Tijessen, 1993, HYBRIDIZATION WITH NUCLEIC variations as infertility biomarkers. ACID PROBES, Elsevier Science Publishers B.V., and 0206 By leveraging genetic data sets obtained across dif Kricka, 1992, NONISOTOPIC DNA PROBE TECH ferent Sources, applying layers of analyses (i.e., filtering, NIQUES, Academic Press, San Diego, Calif. clustering, etc.) to genetic data, and quantifying/qualifying 0202 Particularly preferred hybridization conditions statistical significance of that genetic data, Systems of the include hybridization at a temperature at or near the mean invention are able yield and identify new infertility biomar melting temperature of the probes (e.g., within 51° C., more kers that previously could not be determined to have any preferably within 21°C.) in 1 M NaCl, 50 mM MES buffer association with infertility. For example, methods of the (pH 6.5), 0.5% sodium sarcosine and 30% formamide. invention utilize data sets from different modalities. The data 0203 When fluorescently labeled genetic regions or prod sets range include data obtained from infertility databases ucts of these genetic regions are used, the fluorescence emis (e.g., public and private), sequencing data (e.g., whole sions at each site of a microarray may be, preferably, detected genome sequencing from one or more biological samples), by Scanning confocal laser microscopy. In one embodiment, a and genetic data obtained from mouse modeling, etc. Several separate Scan, using the appropriate excitation line, is carried layers of analysis are then applied to the genetic data to out for each of the two fluorophores used. Alternatively, a identify whether variations are potentially associated with laser may be used that allows simultaneous specimen illumi infertility. Particularly, the genetic data sets are subject to nation at wavelengths specific to the two fluorophores and evolutionary conservation analysis, filtering analysis (see emissions from the two fluorophores can be analyzed simul FIG. 5) and/or subject to clustering analysis (Example 20). taneously (see Shalon et al., 1996, “A DNA microarray sys After those analyses are applied, the variants potentially asso tem for analyzing complex DNA samples using two-color ciated with infertilty are then assessed for biological and fluorescent probe hybridization. Genome Research 6:639 statistical significance. The variants that are determined to be 645, which is incorporated by reference in its entirety for all statistically significant are then classified as infertility biom purposes). In a preferred embodiment, the arrays are scanned arkers, even if those variant had no prior association with with a laser fluorescent scanner with a computer controlled infertility. Accordingly, using the invention’s multi-modal X-Y stage and a microscope objective. Sequential excitation and layered analysis, one is able to identify infertility biom of the two fluorophores is achieved with a multi-line, mixed arkers that would not have been identified or associated with gas laser and the emitted light is split by wavelength and infertility using standard techniques (i.e. comparing genetic detected with two photomultiplier tubes. Fluorescence laser sequences of an abnormal, infertile population to genetic scanning devices are described in Schena et al., Genome Res. sequences of a normal, fertile population). 6:639-645 (1996), and in other references cited herein. Alter natively, the fiber-optic bundle described by Ferguson et al., 0207. While other hybrid configurations are possible, the Nature Biotech. 14:1681-1684 (1996), may be used to moni main memory in a parallel computer is typically either shared tor mRNA abundance levels at a large number of sites simul between all processing elements in a single address space, or taneously. distributed, i.e., each processing element has its own local address space. (Distributed memory refers to the fact that the Computer Systems memory is logically distributed, but often implies that it is physically distributed as well.) Distributed shared memory 0204 FIG. 15 illustrates a computer system 401 useful for and memory virtualization combine the two approaches, implementing methodologies described herein. A system of where the processing element has its own local memory and the invention may include any one or any number of the access to the memory on non-local processors. Accesses to components shown in FIG. 15. Generally, a system 401 may local memory are typically faster than accesses to non-local include a computer 433 and a server computer 409 capable of memory. communication with one another over network 415. Addi 0208 Computer architectures in which each element of tionally, data may optionally be obtained from a database 405 main memory can be accessed with equal latency and band (e.g., local or remote). In some embodiments, systems width are known as Uniform Memory Access (UMA) sys include an instrument 455 for obtaining sequencing data, tems. Typically, that can be achieved only by a shared which may be coupled to a sequencer computer 451 for initial memory system, in which the memory is not physically dis processing of sequence reads. tributed. A system that does not have this property is known as 0205. In some embodiments, methods are performed by a Non-Uniform Memory Access (NUMA) architecture. Dis parallel processing and server 409 includes a plurality of tributed memory systems have non-uniform memory access. processors with a parallel architecture, i.e., a distributed net work of processors and storage capable of collecting, filter 0209 Processor-processor and processor-memory com ing, processing, analyzing, ranking genetic data obtained munication can be implemented in hardware in several ways, through methods of the invention. The system may include a including via shared (either multiported or multiplexed) plurality of processors configured to, for example, 1) collect memory, a crossbar Switch, a shared bus or an interconnect genetic data from different modalities: a) one or more infer network of a myriad of topologies including star, ring, tree, tility databases 405 (e.g. infertility databases, including pri hypercube, fat hypercube (a hypercube with more than one vate and public fertility-related data), b) from one or more processor at a node), or n-dimensional mesh. sequencers 455 or sequencing computers 451, c) from mouse 0210 Parallel computers based on interconnected net modeling, etc.; 2) filter the genetic data to identify genetic works must incorporate routing to enable the passing of mes variations; 3) associate genetic variations with infertility sages between nodes that are not directly connected. The using methods described throughout the application (e.g., medium used for communication between the processors is filtering, clustering, etc.); 4) determine statistical significance likely to be hierarchical in large multiprocessor machines. of genetic variations based on fertility criteria defined herein Such resources are commercially available for purchase for US 2015/O 142331 A1 May 21, 2015 46 dedicated use, or these resources can be accessed via “the ngful trypsin in 50 mM ammonium bicarbonate overnight at cloud, e.g., Amazon Cloud Computing. 37°C.), and the peptides are extracted and microsequenced. 0211 A computer generally includes a processor coupled to a memory and an input-output (I/O) mechanism via a bus. Example 2 Memory can include RAM or ROM and preferably includes at least one tangible, non-transitory medium storing instruc Sample Population for Identification of tions executable to cause the system to perform functions Infertility-Related Polymorphisms described herein. As one skilled in the art would recognize as necessary or best-suited for performance of the methods of 0217 Genomic DNA is collected from 30 female subjects the invention, systems of the invention include one or more (15 who have failed multiple rounds of IVF versus 15 who processors (e.g., a central processing unit (CPU), a graphics were successful). In particular, all of the Subjects are under processing unit (GPU), etc.), computer-readable storage age 38. Members of the control group Succeeded in conceiv devices (e.g., main memory, static memory, etc.), or combi ing through IVF. Members of the test group have a clinical nations thereof which communicate with each other via abus. diagnosis of idiopathic infertility, and have failed three of 0212 A processor may be any Suitable processor known in more rounds of IVF with no prior pregnancy. The women are the art, such as the processor sold under the trademarkXEON able to produce eggs for IVF and have a reproductively nor E7 by Intel (Santa Clara, Calif.) or the processor sold under mal male partner. To focus on infertility resulting from oocyte the trademark OPTERON 6200 by AMD (Sunnyvale, Calif.). defects (and eliminate factors such as implantation defects) 0213 Input/output devices according to the invention may women who have Subsequently conceived by egg donation include a video display unit (e.g., a liquid crystal display are favored. (LCD) or a cathode ray tube (CRT) monitor), an alphanu Example 3 meric input device (e.g., a keyboard), a cursor control device (e.g., a mouse or trackpad), a disk drive unit, a signal genera Sample Population for Identification of tion device (e.g., a speaker), a touchscreen, an accelerometer, Infertility-Related Polymorphisms a microphone, a cellular radio frequency antenna, and a net work interface device, which can be, for example, a network 0218. In a follow-up study of a larger cohort, genomic interface card (NIC), Wi-Fi card, or cellular modem. DNA is collected from 300 female subjects (divided into groups having profiles similar to the groups described above). INCORPORATION BY REFERENCE The DNA sequence polymorphisms to be investigated are 0214) References and citations to other documents, such selected based on the results of small initial studies. as patents, patent applications, patent publications, journals, books, papers, web contents, have been made throughout this Example 4 disclosure. All Such documents are hereby incorporated Sample Population for Identification of Premature herein by reference in their entirety for all purposes. Ovarian Failure (POF) and Premature Maternal EQUIVALENTS Aging Polymorphisms 0215. The invention may be embodied in other specific 0219 Genomic DNA is collected from 30 female subjects forms without departing from the spirit or essential charac who are experiencing symptoms of premature decline in egg teristics thereof. The foregoing embodiments are therefore to quality and reserve including abnormal menstrual cycles or be considered in all respects illustrative rather than limiting amenorrhea. In particular, all of the subjects are between the on the invention described herein. Scope of the invention is ages of 15-40 and have follicle stimulating hormone (FSH) thus indicated by the appended claims rather than by the levels of over 20 international units (IU) and a basal antral foregoing description, and all changes which come within the follicle count of under 5. Members of the control group suc meaning and range of equivalency of the claims are therefore ceeded in conceiving through IVF. Members of the test group intended to be embraced therein. have no previous history of toxic exposure to known fertility damaging treatments such as chemotherapy. Members of this EXAMPLES group may also have one or more female family member who experienced menopause before the age of 40. Example 1 Example 5 Identification of Oocyte Proteins Sample Procurement and Preparation 0216 Oocytes are collected from females, for example mice, by Superovulation, and Zona pellucidae are removed by 0220 Blood is drawn from patients at fertility clinics for treatment with acid Tyrode Solution. Oocyte plasma mem standard procedures such as gauging hormone levels and brane (oolemma) proteins exposed on the Surface can be many clinics bank this material after consent for future distinguished at this point by biotin labeling. The treated research projects. Although DNA is easily obtained from oocytes are washed in 0.01 M PBS and treated with lysis blood, wider population sampling is accomplished using buffer (7 Murea, 2 M thiourea, 4% (w/v) 3-(3-cholami home-based, noninvasive methods of DNA collection such as dopropyl)dimethylammonio-1-propanesulfonate (CHAPS), saliva using an Oragene DNA self collection kit (DNA 65 mM dithiothreitol (DTT), and 1% (v/v) protease inhibitor Genotek). at -80°C.). Oocyte proteins are resolved by one-dimensional 0221 Blood samples Three-milliliter whole blood or two-dimensional SDS-PAGE. The gels are stained, visu samples are venously collected and treated with sodium cit alized, and sliced. Proteins in the gel pieces are digested (12.5 rate anticoagulant and stored at 4°C. until DNA extraction. US 2015/O 142331 A1 May 21, 2015 47

0222 Whole Saliva Whole saliva is collected using the lect Target Enrichment System workflow is solution-based Oragene DNA selfcollection kit following the manufacturers and is performed in microcentrifuge tubes or microtiter instructions. Participants are asked to rub their tongues plates. around the inside of their mouths for about 15 sec and then deposit approximately 2 ml saliva into the collection cup. The Example 7 collection cup is designed so that the solution from the vial.'s lower compartment is released and mixes with the saliva Capture of Genomic DNA when the cap is securely fastened. This starts the initial phase 0227 Genomic DNA is sheared and assembled into a of DNA isolation, and stabilizes the saliva sample for long library format specific to the sequencing instrument utilized term storage at room temperature or in low temperature freez downstream. Size selection is performed on the sheared DNA ers. Whole saliva samples are stored and shipped, if neces and confirmed by electrophoresis or other size detection sary, at room temperature. Whole saliva has the potential method. advantage over other non-invasive DNA sampling methods, 0228 Several methods to capture genomic DNA are Such as buccal and oral rinse, of providing large numbers of known in the art. In one example, the size-selected DNA is nucleated cells (eg., epithelial cells, leukocytes) per sample. purified and the ends are ligated to annealed oligonucleotide 0223 Blood clots—Clotted blood that is usually dis linkers from Illumina to prepare a DNA library. DNA-adaptor carded after extraction through serum separation, for other ligated fragments are hybrized to a Nimblegen Sequence laboratory tests such as for monitoring reproductive hormone Capture array using an X1 mixer (Roche NimbleGen) and the levels is collected and stored at -80° C. until extraction. Roche NimbleGen Hybridization System. After hybridiza 0224 Sample Preparation Genomic DNA is prepared tion, are washed and DNA fragments bound to the array are from patient blood or saliva for downstream sequencing eluted with elution buffer. The captured DNA is then dried by applications with commercially available kits (e.g., Invitro centrifugation, rehydrated and PCR amplified with poly gen.'s ChargeSwitch(R) g|DNA Blood Kit or DNA Genotek merase. Enrichment of DNA can be assessed by quantitative kits, respectively). Genomic DNA from clotted is prepared by PCR comparison to the same sample prior to hybridization. standard methods involving proteinase K digestion, salt?chlo 0229. In a similar example, the size-selected DNA is incu roform extraction and 90% ethanol precipitation of DNA. bated with biotinylated RNA oligonucleotides “baits” for 24 (see N Kanai et al., 1994, “Rapid and simple method for hours. The RNA/DNA hybrids are immobilized to streptavi preparation of genomic DNA from easily obtainable clotted din-labeled magnetic beads, which are captured magneti blood.”JClin Pathol 47:1043-1044, which is incorporated by cally. The RNA baits are then digested, leaving only the target reference in its entirety for all purposes). selected DNA of interest, which is then amplified and sequenced. Example 6 Example 8 Manufacturing of a Customized Oligonucleotide Sequencing of Target Selected DNA Library 0230 Target-selected DNA is sequenced by a paired end 0225. A customized oligonucleotide library can be used to (50 bp) re-sequencing procedure using Illumina.'s Genome enrich samples for DNAs of interest. Several methods for Analyzer. The combined DNS targeting and resequencing manufacturing customized oligonucleotide libraries are provides 45 fold redundancy which is greater than the known in the art. In one example, Nimblegen sequence cap accepted industry standard for SNP discovery. ture custom array design is used to create a customized target enrichment system tailored to infertility related genes. A cus Example 9 tomized library of oligonucleotides is designed to target genetic regions of Tables 1-7. The custom DNA oligonucle Correlation of Polymorphisms with Fertility otides are synthesized on a high density DNA Nimblegen 0231 Polymorphisms among the sequences of target Sequence Capture Array with Maskless Array Synthesizer selected DNA from the pool of test subjects are identified, and (MAS) technology. The Nimblegen Sequence Capture Array may be classified according to where they occur in promoters, system workflow is array based and is performed on glass splice sites, or coding regions of a gene. Polymorphisms can slides with an X1 mixer (Roche NimbleGen) and the Nimble also occur in regions that have no apparent function, such as Gen Hybridization System. introns and upstream or downstream non-coding regions. 0226. In a similar example, Agilent’s eArray (a web-based Although Such polymorphisms may not be informative as to design tool) is used to create a customized target enrichment the functional defect of an allele, nevertheless, they are linked system tailored to infertility related genes. The SureSelect to the defect and useful for predicting infertility. The poly Target Enrichment System workflow is solution-based and is morphisms are analyzed statistically to determine their cor performed in microcentrifuge tubes or microtiter plates. A relation with the fertility status of the test subjects. The sta customized oligonucleotide library is used to enrich samples tistical analysis indicates that certain polymorphisms identify for DNA of interest. Agilent’s eArray (a web-based design gene defects that by themselves (homozygous or heterozy tool) is used to create a customized target enrichment system gous) are sufficient to cause infertility. Other polymorphisms tailored to infertility related genes. A customized library is identify genetic variants that reduce, but do not eliminate designed to target genetic regions of Tables 1-7. The custom fertility. Other polymorphisms identify genetic variants that RNA oligonucleotides, or baits, are biotinylated for easy cap have an apparent effect on fertility only in the presence of ture onto Streptavidin-labeled magnetic beads and used in particular variants of other genes. Other polymorphisms iden Agilent’s SureSelectTarget Enrichment System. The SureSe tify genetic variants that have an apparent effect on fertility US 2015/O 142331 A1 May 21, 2015 48 only in the presence of particular phenotypes. Other polymor only in the presence of particular variants of other genes. phisms identify genetic variants that have an apparent effect Other polymorphisms identify genetic variants that have an on fertility only in the presence of particular environmental apparent effect on premature maternal aging only in the pres exposures. Still other polymorphisms identify genetic vari ence of particular phenotypes. Other polymorphisms identify ants that have an apparent effect on fertility only in the pres genetic variants that have an apparent effect on premature ence of any combination of particular variants of other genes, maternal aging only in the presence of particular environmen presence of particular phenotypes, and particular environ tal exposures. Still other polymorphisms identify genetic mental exposures. variants that have an apparent effect on premature maternal aging only in the presence of any combination of particular Example 10 variants of other genes, presence of particular phenotypes, and particular environmental exposures. Correlation of Polymorphisms with Premature Ovarian Failure (POF) Example 12 0232 Polymorphisms among the sequences of target selected DNA from the pool of test subjects are identified, and Diagnostics and Counseling may be classified according to where they occur in promoters, splice sites, or coding regions of a gene. Polymorphisms can 0234. A library of nucleic acids in an array format is pro also occur in regions that have no apparent function, Such as vided for infertility diagnosis. The library consists of selected introns and upstream or downstream non-coding regions. nucleic acids for enrichment of genetic targets wherein poly Although Such polymorphisms may not be informative as to morphisms in the targets are correlated with variations in the functional defect of an allele, nevertheless, they are linked fertility. A patient nucleic acid sample (appropriately cleaved to the defect and useful for predicting likelihood of premature and size selected) is applied to the array, and patient nucleic ovarian failure (POF). The polymorphisms are analyzed sta acids that are not immobilized are washed away. The immo tistically to determine their correlation with the POF status of bilized nucleic acids of interest are then eluted and sequenced the test Subjects. The statistical analysis indicates that certain to detect polymorphisms. According to the polymorphisms polymorphisms identify gene defects that by themselves (ho detected, the fertility status of the patient is evaluated and/or mozygous or heterozygous) are sufficient to cause POF. Other quantified. The patient is accordingly advised as to the Suit polymorphisms identify genetic variants that increase the ability and likelihood of success of a fertility treatment or likelihood, but do not cause POF. Other polymorphisms iden suitability or necessity of a particular in vitro fertilization tify genetic variants that have an apparent effect on POF only procedure. in the presence of particular variants of other genes. Other polymorphisms identify genetic variants that have an appar ent effect on POF only in the presence of particular pheno Example 13 types. Other polymorphisms identify genetic variants that have an apparent effect on POF only in the presence of par Diagnostics and Counseling ticular environmental exposures. Still other polymorphisms identify genetic variants that have an apparent effect on POF 0235 A complete DNA sequence of any number of or all only in the presence of any combination of particular variants of the genes in Tables 1-7 is determined using a targeted of other genes, presence of particular phenotypes, and par resequencing protocol. According to the polymorphisms ticular environmental exposures. detected and the phenotypic traits and environmental expo sures reported, the fertility status of the patient is evaluated Example 11 and/or quantified. The patient is accordingly advised as to the suitability and likelihood of success of a fertility treatment or Correlation of Polymorphisms with Premature suitability or necessity of a particular in vitro fertilization Maternal Aging procedure. 0233 Polymorphisms among the sequences of target selected DNA from the pool of test subjects are identified, and Example 14 may be classified according to where they occur in promoters, splice sites, or coding regions of a gene. Polymorphisms can Diagnostics and Counseling also occur in regions that have no apparent function, Such as introns and upstream or downstream non-coding regions. 0236. A library of nucleic acids in an array format is pro Although Such polymorphisms may not be informative as to vided for infertility diagnosis. The library consists of selected the functional defect of an allele, nevertheless, they are linked nucleic acids for enrichment of genetic targets wherein poly to the defect and useful for predicting likelihood of premature morphisms in the targets are correlated with variations in decline in ovarian reserve and egg quality (i.e. maternal fertility. A patient nucleic acid sample (appropriately cleaved aging). The polymorphisms are analyzed statistically to and size selected) is applied to the array, and patient nucleic determine their correlation with the maternal aging status of acids that are not immobilized are washed away. The immo the test Subjects. The statistical analysis indicates that certain bilized nucleic acids of interest are then eluted and sequenced polymorphisms identify gene defects that by themselves (ho to detect polymorphisms. According to the polymorphisms mozygous or heterozygous) are sufficient to cause premature detected and the phenotypic traits and environmental expo maternal aging. Other polymorphisms identify genetic Vari sures reported, the POF status of the patient or likelihood of ants that increase the likelihood, but do not cause premature future POF occurrence is evaluated and/or quantified. The maternal aging. Other polymorphisms identify genetic Vari patient is accordingly advised as to whether preventative egg ants that have an apparent effect on premature maternal aging or ovary preservation is indicated. US 2015/O 142331 A1 May 21, 2015 49

Example 15 genetic biomarker discovery. The data obtained from WGS can be advantageously used to expand the ability to identify Diagnostics and Counseling and characterize female infertility biomarkers. However, the 0237. A complete DNA sequence of any number of or all ability to identify unknown variations offertility significance of the genes in Tables 1-7 is determined using a targeted within the vast WGS datasets is a challenging task that is resequencing protocol. According to the polymorphisms analogous to finding a needle in a haystack. detected and the phenotype and environmental exposures 0241 Methods of the invention, according to certain reported, the fertility status of the patient is evaluated and/or embodiments, rely on bioinformatics to filter through WGS quantified. According to the polymorphisms detected and the data in order to identify and prioritize variations of infertility phenotypic traits and environmental exposures reported, the significance. Specifically, the invention relies on a combina POF status of the patient or likelihood of future POF occur tion of clinical phenotypic data and an infertility knowledge rence is evaluated and/or quantified. The patient is accord base to rank and/or score genomic regions of interest and their ingly advised as to whether preventative egg or ovary preser likely impact on different fertility disorders. In certain Vation is indicated. aspects, the filtering approach involves assessing sequencing data to identify genomic variations, identifying at least one of Example 16 the variations as being in a genomic region associated with infertility, determining whether the at least one variation is a Diagnostics and Counseling biologically-significant variation and/or a statistically-sig nificant variation, and characterizing at least one identified 0238 A library of nucleic acids in an array format is pro variation as an infertility biomarker based on the determining vided for infertility diagnosis. The library consists of selected step. A genomic region associated with infertility is any DNA nucleic acids for enrichment of genetic targets wherein poly sequence in which variation is associated with a change in morphisms in the targets are correlated with variations in fertility. Such regions may include genes (e.g. any region of fertility. A patient nucleic acid sample (appropriately cleaved DNA encoding a functional product), genetic regions (e.g. and size selected) is applied to the array, and patient nucleic regions including genes and intergenic regions with a particu acids that are not immobilized are washed away. The immo lar focus on regions conserved throughout evolution in pla bilized nucleic acids of interest are then eluted and sequenced cental mammals), and gene products (e.g., RNA and protein). to detect polymorphisms. According to the polymorphisms In particular embodiments, the infertility-associated genetic detected and the phenotypic traits and environmental expo region is a maternal effect gene, as described above. In par Sures reported, the maternal aging status of the patient or ticular embodiments, the infertility-associated genetic region likelihood of future premature maternal aging occurrence is is a gene (including exons, introns, and evolutionarily con evaluated and/or quantified. The patient is accordingly served regions of DNA flanking either side of said gene) that advised as to whether preventative egg or ovary preservation, impacts fertility. minimization of certain environmental exposures such as 0242. This filtering approach facilitates rapid identifica alcohol intake or Smoking, or mitigation of certain pheno tion of functionally relevant variants within genomic regions types such as having children at a younger age is indicated. of significance for fertility. The identified variations with Example 17 infertility significance obtained from WGS data may be used in diagnostic testing, and ultimately assist physicians in data Diagnostics and Counseling interpretation, guide fertility therapeutics, and clarify why Some patients are not responding to treatment. The following 0239. A complete DNA sequence of any number of or all illustrates use of WGS data to identify variants of interest in of the genes in Tables 1-7 is determined using a targeted accordance with methods of the invention. resequencing protocol. According to the polymorphisms 0243 FIG. 5 generally illustrates filtering through varia detected and the phenotypic traits and environmental expo tions obtained from WGS sequencing data in order to identify sures reported, the fertility status of the patient is evaluated variations of infertility significance. As shown in FIG. 5, the and/or quantified. According to the polymorphisms detected first step is to identify sequence variants in whole genome and the phenotype and environmental exposures reported, the sequence. A typical whole genome can include up to four maternal aging status of the patient or likelihood of future million variants. The next filtering step involves eliminating premature maternal aging occurrence is evaluated and/or variants outside of regions of interest for female fertility quantified. The patient is accordingly advised as to whether (which amounts to about one million variants). Next, the preventative egg or ovary preservation, minimization of cer filtering method isolates variants within regions of interest for tain environmental exposures such as alcohol intake or Smok female fertility, which is described herein as Fertilome ing, or mitigation of certain phenotypes such as having chil nucleic acid (i.e. regions of the that control dren at a younger age is indicated. egg quality and fertility). Variations located within the Fer tilome nucleic acid may be in the 100,000s. The variations Example 18 within the Fertilome nucleic acid are further filtered to iden tify and score variations of infertility significance (such varia Whole Genome Sequencing for Female Infertility tions are typically present in double digits). Particularly, Biomarker Discovery variations of infertility significance include those within 0240 Whole genome sequencing (WGS) allows one to regions predicted to effect biological function or that show a characterize the complete nucleic acid sequence of an indi statistical correlation to infertility or treatment failure. vidual’s genome. With the amount of data obtained from 0244 Biologically-significant variations within the Fer WGS, a comprehensive collection of an individuals genetic tilome nucleic acid include mutations that result in a change: variation is obtainable, which provides great potential for 1) to a different amino acid predicted to alter the folding US 2015/O 142331 A1 May 21, 2015 50 and/or structure of the encoded protein, 2) to a differentamino above). Variants of biological and statistical significance are acid occurring at a site with high evolutionarily conservation then entered into the infertility knowledgebase (i.e. Fertilome in mammals, 3) that introduces a premature stop termination database) in order to classify those variants as fertility biom signal. 4) that causes a stop termination signal to be lost, 5) arkers. that introduces a new start codon, 6) that causes a start codon 0247 The following illustrates use of WGS data to iden to be lost or 7) that disrupts a splicing signal. Statistically tify variants of interest in accordance with methods of the significant variations within the Fertilome nucleic acid are invention. described in relation to and listed in Tables 2 and 3. Other 0248 Samples were collected from female patients under methods for classifying variations as statistically- or biologi going fertility treatment at an academic reproductive medical cally-significant includes scoring variations using an infertil center, and categorized into idiopathic infertility or primary ity knowledgebase (which is described in relation to Tables ovarian insufficiency (POI) study groups. Phenotypic infor 5-7 above and FIG. 6 below). The infertility knowledgebase mation was collected for each patient by mining >200 vari ranks genes based on attributes associated with infertility. ables from electronic health records. Genomic DNA The attributes include: diseases and disorders related to infer extracted from blood samples underwent WGS by Complete tility, molecular pathways, molecular interactions, gene clus Genomics (Mountain View, Calif.). Analysis of genetic vari ters, mouse phenotypes associated with each gene, gene ants from WGS was assisted by an infertility knowledgebase expression data in reproductive tissues, proteomics data in with >800 genomic regions of interest (ROI) ranked by a oocytes, and accrued information from Scientific publications scoring algorithm predicting their likely impact on different through text-mining. List of ranked genes of interest are pro fertility disorders, based on publications, data repositories vided in Tables 5-7. (including protein-protein interactions and tissue expression 0245 FIG. 6 illustrates various data sources integrated patterns), meta-analyses of these data, and animal model into the infertility knowledgebase for analyzing whole-ge phenotypes. nome sequencing data according to certain embodiments. As 0249. The collected female samples were subjected to the shown in FIG. 6, information is obtained from private and processes/algorithms depicted in FIGS. 5-7 (described in public fertility-related data. Private and/or public fertility more detail above). With those female samples, approxi related data may include implantation genes, idiopathic infer mately 50,000 novel variants (approximately 1.6% of total tility genes, polycystic ovary syndrome (PCOS) genes, egg variants observed) were identified as having fertility signifi quality genes, endometriosis genes, and premature ovarian cances that have not been previously reported in databases failure genes. The private and/or public fertility-related data such as the sbSNP reference. The identified fertility-related is then subjected to the ABCoRE Algorithm to provide variants included single nucleotide polymorphisms (SNPs, genomic regions and variations of interest that can be intro insertions, deletions, copy number variations, inversions, and duced into a fertility database evidence matrix along with translocations. Of the SNPs, some of them are predictive to other fertility-related information. As described in the have putative functional significance based on the knowl detailed description, the ABCoRE algorithm identifies fertil edgebase. For example, the knowledgebase scored some ity regions of interest by performing evolutionary conserva SNPs as deleterious mutations due to potential loss of func tion analysis of one or more genes obtained from the private tion or changes in protein structure. and/or public fertility-related data. The other fertility-related 0250 In certain aspects, the genomic data, such as WGS information includes, for example, protein-protein interac data, of a patient/subject population is subjected to a popula tions, pathway interactions, gene orthologs and paralogs, tion stratification correction. Population stratification correc genomic "hotpsots', gene protein expression and meta tion accounts for the presence of a systematic difference in analysis, and data from genomic studies. In operation, whole allele frequencies between subpopulations in a population genomic sequencing data is compared to the compiled data in possibly due to different ancestry. When conducting popula the fertility database evidence matrix to facilitate identifica tion stratification, data is compared to a number (e.g. 1,000) tion of potential genetic regions important for fertility. The of ethnically diverse individuals as part of the 1000 Genomes fertility database evidence matrix filters through WGS vari Project (100G). Principal components analysis (PCA) is ants to identify variants of fertility significance. In certain applied to model and identify ancestry differences. In addi embodiments, the whole genomic sequencing data is also tion, computed association statistics are adjusted for the first subjected to the SESMe algorithm that ranks each genetic two principal components. region from most to least important for different aspects of 0251 FIG. 13 illustrates population stratification correc female fertility. tion of two patient groups. The patient groups include female 0246 FIG. 7 illustrates a bioinformatics pipeline used to patients undergoing non-donor in vitro fertilization (IVF) filter through WGS data to identify biomarkers associated cycles. The patients were 38 years old or younger at the time with infertility according to certain embodiments. As shown of enrollment, and had no history of carrying a pregnancy in FIG. 7, Samples are subjected to whole genome sequenc beyond the first term before IVF treatment. Each patient had ing, mapping, and assembly. The WGS data is then analyzed lack of an apparent cause for infertility (i.e. unexplained) after to discover genetic variants such as SNPs, small indels, an evaluation of a complete medical history, physical exami mobile elements, copy number variations, and structural nation, endocrine profile, and the results of an intimate part variations. The identified variations are then assessed for ner's sperm analysis. The patients were divided into two statistical significance (See, for example, Tables 2 and 3 groups. Group A included 11 patients that experienced no live above). This includes correction for population stratification, birth or pregnancy beyond the first trimester after 3 or more variation-level significance tests, and gene level significance IVF cycles. Group B included 18 patients that experienced tests. In addition, the biological significance of WGS variants live birth or pregnancy beyond the first trimester through use is determined using the SnpEff and Variant Effect Predictor of IVF therapy. With population stratification correction, (www.ensembl.org) engines (See, for example, Table 1 Group A and B patients cluster (are shown as black dots) with US 2015/O 142331 A1 May 21, 2015

East Asian, African, Hispanic, and European individuals as within Groups A and B. CRTC1 is associated with ovary, shown in the principal component analysis chart of FIG. 13. oocyte, endometrium, and placenta expression. GDF1 is This data shows that ethnicity may be linked to infertility, or associated with defects in the formation of anterior visceral that certain genomic variations are more prevalent in certain endoderm and mesoderm. As shown, both patient groups ethnic populations. Accordingly, aspects of the invention exhibit copy number deletions in those genes. FIG. 12 illus involve assessing ethnicity of an individual, either through trates a specific copy number variation detected in a non self-reporting by the individual (e.g., by a questionnaire) or coding region of Chromosome 6. As shown, both patient via an assay that looks for known biomarkers related to groups exhibit copy number duplication that region. genetic ethnicity of an individual. That ethnicity data (genetic or self-reported) may be used to guide testing, Such as by Example 20 ensuring that certain genomic variations are checked that are 0258. In addition to using the existing infertility knowl known to be associated with certain ethnic populations. edge base to identify new genetic variations associated with infertility (e.g., Example 18), methods of the invention fur Example 19 ther utilize the existing infertility knowledgebase to identify 0252) Approximately 15% of couples experiencing diffi commonalities between known infertility genes and genes culty conceiving are diagnosed with idiopathic infertility. having no prior association with infertility. By identifying Genetic polymorphisms could shed light on many of these commonalities between infertility genes and genes having no currently unexplained cases by revealing disruptions to prior association with infertility, one is able to expand the list oocyte quality or uterine receptivity that may exist on a Sub of potential genes associated with infertility and guide under cellular level. standing as to what gene functions and changes are causally 0253) In accordance with certain aspects, copy number linked to infertility. For example, genes having commonali variations are examined for their effect on female fertility ties with known infertility genes can be identified as potential using comparative genomic hybridization (CGH) arrays. infertility biomarkers, and used in phenotypic studies (such CGH provides for methods of determining the relative num those performed in mice) related to infertility, thereby ber of copies of nucleic acid sequences in one or more subject expanding the breadth infertility knowledgebase. genomes or portions thereof (for example, an infertility 0259. In order to determine commonalities between infer marker) as a function of the location of those sequences in a tility genes and genes without prior associated with infertility, reference genome (for example, a normal human genome). methods of the invention utilize cluster analysis techniques. As a result, CGH provides a map of losses and gains in nucleic Generally, a cluster analysis involves grouping a set of objects acid copy number across the entire genome without prior in Such a way that certain objects are clustered in one group knowledge of specific chromosomal abnormalities. Methods are more similar to each other than objects in another group or of the invention capitalize on the ability to detect copy num cluster. Methods of the invention cluster known infertility ber variations without the need for prior knowledge in order to genes with genes not associated with infertility based on detect potential mutations with infertility significance within features such as gene expression, phenotype, and genetic patient populations that have unexplained infertility. pathways. From the cluster analysis, one can identify genes 0254 The following illustrates use of CGH arrays to iden without prior association with infertility that exhibit features tify copy number variants of interest in accordance with with a high degree of similarity (relatedness) to infertility methods of the invention. genes. Those genes exhibiting a high degree of similarity (as 0255. The study examined female patients undergoing shown through the cluster analysis) can be identified as a non-donor in vitro fertilization (IVF) cycles. The patients potential infertility biomarker. were 38 years old or younger at the time of enrollment, and 0260 The following describes a clustering method used to had no history of carrying a pregnancy beyond the first term identify a potential infertility biomarker in accordance with before IVF treatment. Each patient had lack of an apparent methods of the invention. The method is typically a computer cause for infertility (i.e. unexplained) after an evaluation of a implemented method, e.g. utilizes a computer system that complete medical history, physical examination, endocrine includes a processor and a computer readable storage profile, and the results of an intimate partner's sperm analy medium. The processor of the computer system executes S1S. instructions obtained from the computer-readable storage 0256 The patients were divided into two groups. Group A device to perform the cluster analysis. included 11 patients that experienced no live birth or preg 0261. In accordance with to certain aspects, the method nancy beyond the first trimester after 3 or more IVF cycles. involves obtaining a gene data set that includes both known Group B included 18 patients that experienced live birth or infertility genes and genes having no prior association with pregnancy beyond the first trimester through use of IVF infertility. In certain embodiments, the gene data sets may be therapy. taken from known infertility databases, sequencing data 0257 FIG. 9 provides CGH array data of copy number obtained from patients, or sequencing data obtained from variations detected in the study populations within statisti mouse modeling studies. The genes forming the cluster data cally significant regions associated with infertility (i.e. copy set (those associated with infertility and those not known to be number variations within the Fertilome nucleic acid). FIG. 10 associated with infertility) are typically mammalian genes. illustrates a specific copy number variation detected in the The mammalian genes may correspond to mouse genes, GJC2 gene of Chromosome 1 within Groups A and B. This human, genes, or a combination thereof. A cluster analysis is region is specifically expressed in both the oocyte and brain, then performed on the gene data set to determine a relation and is known to be associated with embryo issues. As shown, ship between the one or more genes not associated with the region within GJC2 showed deletion in the most infertile infertility and the known infertility genes. If a gene not asso patients. FIG. 11 illustrates a specific copy number variation ciated with infertility is shown to cluster with a known infer detected in the CRTC1 and GDF1 genes of Chromosome 19 tility gene, the method provides for identifying that gene as a US 2015/O 142331 A1 May 21, 2015 52 potential infertility biomarker. If the gene not associated with tion are grouped together in one cluster, and phenotypes of infertility does not cluster with a known infertility gene, then embryo patterning, morphology and growth are grouped in a that gene is less likely to be causally linked to infertility in the separate cluster, etc. The degree of relatedness or common same/similar manner as that known infertility gene. ality between clustered genes (as determined by the cluster 0262 Methods of the invention assess several features (or analysis) can then be highlighted on the resulting cluster parameters) of genes in order to determine commonalities matrix. For example, red may be used to indicate that the gene and thus cluster genes not associated with infertility with is associated with one very specific phenotype and/or is known infertility genes based on the commonalities. In cer expressed at high levels in the associated tissue/physiological tain embodiments, those features include gene expression, system indicated on the opposite axis; whereas blue may be phenotypes, gene pathways, and a combination thereof. One used to indicate that the gene is associated with a number of or more of those features can contribute to a gene's position in different and varied phenotypes and/or is expressed at low the clustering. levels in the associated tissue. 0263. Feature data (such as gene expression, phenotype, 0266 By clustering genes into feature specific groups and gene pathway, etc.) is obtained for both known infertility color-coding genes with high degree of relatedness, the genes and genes not known to be associated with infertility. resulting cluster matrix of the invention advantageously The feature and gene data is compiled to form a matrix that allows for visualization of groups of genes that are strongly will be used to exhibit the cluster analysis. For example, the associated with phenotypes relating to particular tissues or feature data is pre-processed to express each domain as a row physiological systems (i.e. clusters of interest). Thus, cluster and each feature as a column (or vice versa). For domains matrices of the invention allow one to quickly identify genes with continuous values such as gene expression, the features without prior association with infertility as potential infertil are the individual tissues where gene expression was mea ity biomarkers based on their shown association (cluster) with Sured, and each value in the matrix (Xi) represents the known infertility biomarkers. This clustering and identifica expression of gene i in tissue j. For domains with categorical tion of potential infertility biomarkers is done independently values such as phenotypes, the features are the individual from and without correlating a gene’s proximity with other phenotypes, and each value in the matrix (Xij) is a binary genes within or location on the Fertilome (genomic region indicator representing whether gene i is associated with phe associated with infertility). As a result, clustering provides an notype j. All of the domain specific matrices are then com additional method of identifying infertility genes of interest bined column-wise. A distance metric is then applied to each that can be used to complement and in addition to other pair of rows and each pair of columns in the matrix. In certain techniques for identifying infertility genes of interest. embodiments, the distance metric is Distance=1-correla 0267. The following describes a specific example of using tion. However, it is understood that other standard distance the above described cluster analysis to correlate genes not metrics could be used (e.g. Euclidean). known to be associated with infertility and a known infertility 0264 Standard hierarchical clustering is then used to clus gene. ter the rows and columns of the matrix in order to determine 0268 Activin receptor 2b (ACVR2B) is a significant copy feature commonalities between known infertility genes and number variation identified in a cohort of patients with infer other genes. Various hierarchical clustering techniques are tility (i.e. copy number variation in this gene was identified as known in the art, and can be applied to methods of the inven being significantly associated with an infertile phenotype in tion for clustering infertility genes with genes not associated humans). Activin receptor 2B is the receptor bound by with infertility. Hierarchical clustering techniques are Activin, a protein previously known in the art to be involved described in, for example, Sturn, Alexander, John Quacken in both human and mouse reproduction and embryonic devel bush, and Zlatko Trajanoski. “Genesis: cluster analysis of opment. Activin/Nodal signaling regulates pluripotency and microarray data.” Bioinformatics 18.1 (2002): 207-208; several aspects of patterning during early embryogenesis. Yeung, Ka Yee, and Walter L. Ruzzo. “Principal component Together with Inhibin and Follistatin, Activin is also involved analysis for clustering gene expression data.” Bioinformatics in the complex feedback loops that selectively regulate FSH 17.9 (2001): 763-774; Eisen, Michael B., et al. “Cluster secretion. analysis and display of genome-wide expression patterns.” 0269. A cluster analysis was performed that compared Proceedings of the National Academy of Sciences 95.25 those features of ACVR2B and features of a plurality of genes (1998): 14863-14868. Generally, clustering involves compar not known to be associated with infertility. Based on the ing features of one or more genes not associated with features cluster analysis, several of the plurality of genes were deter of one or more known infertility, and categorizing the genes mined to cluster with the ACVR2B gene due to a common into one or more feature groups based on the comparison. ality between functional and phenotypic features. The genes After the comparison, the cluster analysis may further involve clustered with the ACVR2B gene were thus identified as assigning a value to the categorized genes based on a degree potential infertility biomarkers. FIG. 14 illustrates the results of relatedness. For example, genes clustered together having of a cluster analysis with ACVR2B. highly similar or the same features may be assigned a high What is claimed is: value (e.g. positive integer). The degree of relatedness may be 1. A system for identifying a potential infertility biomarker, highlighted on the resulting cluster matrix via colors, e.g. the system comprising: high degree of commonality being shown in red and low a processor; and degree of commonality being shown in blue. a computer-readable storage device containing instruc 0265. After a hierarchical clustering technique is applied tions that when executed by the processor cause the to the gene/feature data, the gene clusters are displayed system to: against certain feature categories (e.g. phenotype/gene receive data on a set of genes, the set comprising genes expression category), which are then clustered to reflect known to be associated with infertility and genes hav commonality. For example, phenotypes of female reproduc ing no prior association with infertility; US 2015/O 142331 A1 May 21, 2015

perform a cluster analysis to identify one or more of the performing on the computer a cluster analysis to identify genes that have no prior association with infertility one or more of the genes that have no prior association that cluster with one or more of the genes known to be with infertility that cluster with one or more of the genes associated with infertility; and known to be associated with infertility; and identify at least one of the genes having no prior asso identify at least one of the genes having no prior associa ciation with infertility as a potential infertility biom tion with infertility as a potential infertility biomarker arker based on it clustering with one or more genes based on it clustering with one or more genes known to known to be associated with infertility. be associated with infertility. 2. The system of claim 1, wherein the cluster analysis 11. The computer-implemented method of claim 10, comprises the steps of wherein the cluster analysis comprises the steps of identifying a feature of one or more of the genes known to be associated with infertility; identifying a feature of one or more of the genes known to analyzing one or more of the genes having no prior asso be associated with infertility; ciation with infertility for the same feature; and analyzing one or more of the genes having no prior asso comparing the feature of the genes known to be associated ciation with infertility for the same feature; and with infertility with the feature of the genes having no comparing the feature of the genes known to be associated prior association with infertility. with infertility with the feature of the genes having no 3. The system of claim 2, wherein the feature is selected prior association with infertility. from the group consisting of a phenotype, a gene expression 12. The computer-implemented method of claim 11, pattern, a genetic pathway, and a combination thereof. wherein the feature is selected from the group consisting of a 4. The system of claim 3, wherein the cluster analysis phenotype, a gene expression pattern, a genetic pathway, and further comprises categorizing the one or more genes having a combination thereof; no prior association with infertility into one or more feature 13. The computer-implemented method of claim 12, groups based on the comparison. wherein the cluster analysis further comprises categorizing 5. The system of claim 4, wherein the cluster analysis the one or more genes having no prior association with infer further comprises assigning a value to the categorized genes tility into one or more feature groups based on the compari based on a degree of relatedness. 6. The system of claim 1, wherein the one or more genes SO. having no prior association with infertility are mammalian 14. The computer-implemented method of claim 13, genes. wherein the cluster analysis further comprises assigning a 7. The system of claim 1, wherein the mammalian genes value to the categorized genes based on a degree of related correspond to a species selected from mouse, human, and a CSS. combination thereof. 15. The computer-implemented method of claim 10, 8. The system of claim 1, wherein the genes known to be wherein the one or more genes having no prior association associated with infertility are mammalian genes. with infertility are mammalian genes. 9. The system of claim 8, wherein the mammalian genes 16. The computer-implemented method of claim 15, correspond to a species selected from mouse, human, and a wherein the mammalian genes correspond to a species combination thereof. selected from mouse, human, and a combination thereof. 10. A computer-implemented method for identifying a 17. The computer-implemented method of claim 10, potential infertility biomarker, the method comprising: wherein the genes known to be associated with infertility are receiving to a computer, data on a set of genes, the data mammalian genes. obtained at one infertility-associated database and sequencing data obtained from a biological sample, the 18. The computer-implemented method of claim 17, set comprising genes known to be associated with infer wherein the mammalian genes correspond to a species tility and genes having no prior association with infer selected from mouse, human, and a combination thereof. tility; k k k k k