Supplementary figure 1 The mutated amino acid residues of SPEF2 and their conservativeness across species. The conservativeness analysis was conducted using the ClustalX2 tool. Conserved residues are marked in black (100% conservativeness), dark gray (80% conservativeness), and light gray (60% conservativeness). No shading denotes the residues with less than 60% conservativeness.

A

B

Supplementary figure 2 The SPEF2 transcripts and SPEF2 domains affected by the MMAF-associated SPEF2 . (A) According to the Ensembl database (GRCh38), all the six protein-coding transcripts (T201 to T216) of SPEF2 are illustrated here. The SPEF2 variant c.910C>T (p.Arg304*) affects the transcripts T201, T202, T203, T211 and T216, but does not affect T212. The remaining two SPEF2 variants affect the long transcripts T202, T203 and T216. (B) The corresponding positions and affected domains of the previously reported Spef2 mutations in animals. All three human stop-gain variants of SPEF2 (shown in red) are located before the IFT20 binding domain, which has an important role in sperm tail development. This domain was truncated in all three SPEF2-mutated men. The Spef2 mutations associated with PCD-like symptoms in animals are specifically located in the CH domain.

A B C

D E

Supplementary figure 3 Differential interference contrast microscopy (DICM) analysis of the human spermatozoa. DICM images showed normal sperm morphology in a healthy control male (A), while MMAF phenotypes were observed in a SPEF2-mutated male (B to E). The data of the SPEF2-mutated subject A028-II-1 are exemplified here. Scale bars: 5 μm.

Supplementary figure 4 CFAP69 immunostaining in the spermatozoa from MMAF subjects with the CFAP251 or DNAH1 mutations. The spermatozoa were stained with anti-CFAP69 (green) and anti-α-tubulin (red) antibodies. DNA was counterstained with DAPI as a nuclei marker. In the spermatozoa from a CFAP251-mutated subject and a DNAH1-mutated subject, CFAP69 staining was still obvious in the abnormal sperm flagella. Scale bars: 5 μm.

Supplementary figure 5 Chest X-ray image of the SPEF2-mutated subject A028-II-1. The subject with situs solitus has normal organ placement of left cardiac apex, left gastric vesicle and right liver. And the bilateral pulmonary veins were clear and no significant lesion was observed in the lung.

Supplementary figure 6 The expression of six protein-coding transcripts (T201 to T216) of SPEF2 in human control tissues. RNAs prepared from brain, lung, and sperm samples were analyzed using RT-PCR assays. After 35 cycles of amplification using transcript-specific primers (online supplementary table 1), PCR products of four protein-coding transcripts (T201, T202, T203 and T212) were clearly detected in the brain, lung and sperm samples. M: marker.

Supplementary figure 7 The IFT20 binding domain of SPEF2 encoded by the long transcripts T202, T203 and T216. The sequence corresponding to the IFT20 binding domain was indicated by a red underline according to a previous report.1 The conservativeness analysis was also conducted using the ClustalX2 tool. Conserved residues are marked in black (100% conservativeness), dark gray (80% conservativeness), and light gray (60% conservativeness). No shading denotes residues with less than 60% conservativeness.

Supplementary figure 8 The expressions of SPEF2 transcripts T201/T202/T203 and T212 in spermatozoa of the SPEF2-mutated subject A028-II-1. The primers for transcripts T201/T202/T203 are applicable to both the intact transcripts in controls and the truncated transcripts affected by the SPEF2 variant c.910C>T (p.Arg304*). Compared to the control sample of a healthy man, the expression level of transcripts T201/T202/T203 is strongly reduced in the spermatozoa from subject A028-II-1. SPEF2 transcript T212 was not affected by the variant c.910C>T (p.Arg304*) and was normally expressed in the spermatozoa from subject A028-II-1. M: marker.

Supplementary table 1 Primers for the expression analysis of SPEF2 transcripts by RT-PCR. SPEF2 transcript Primer sequences (5' to 3') Product length T201/T202/T203/ Forward: TCTCGCTTGGAGCCAACACT 284 bp T211/T216-Spec * Reverse: CTGTATGTAATCCGCATCAG Forward: AGACACTACCTGCTAACCC T201 341 bp Reverse: TACCCAGACAATTACTATGCT Forward: GAGAAAAGGGAACAGAAGG T202 413 bp Reverse: TGGAAGATTTATTGCAGAA Forward: TCACATAACCACCAAGCAG T203 183 bp Reverse: ACAGGAGTCAAACAGGGAG Forward: AGGCTTCTTGGACAACTGG T211 426 bp Reverse: GTTGCCCAAAGTCTGAATG Forward: TTTTCAAGAGCAATACTTAAAC T212 113 bp Reverse: CGAGTGCTTTCAAAGTACG Forward: CACTCAGCAGGCAGAGGCA T216 402 bp Reverse: ACTTCCAATCACGGTAGCC * Primers used to detect SPEF2 transcripts of T201/T202/T203/T211/T216.

Supplementary table 2 Average semen parameters in the patients mutated in SPEF2 and DNAH1 in the study. MMAF with MMAF with Overall Subject SPEF2 mutations DNAH1 mutations M MAF (n=3) (n=16) (n=50) Semen Parameters Semen volume (mL) 3.4±0.6 2.8±1.2 3.2±1.5 Sperm concentration (106/mL) 13.6±6.6* 28.5±30.9 31.9±34.0 Motility (%) 0.2±0.3* 2.4±4.1 2.9±5.6 Progressive motility (%) 0±0* 1.6±3.9 0.8±2.4 Sperm Morphology Absent flagella (%) 19.7±7.6 15.7±12.4 17.7±11.4 Short flagella (%) 36.0±11.5 45.2±10.7 43.6±16.7 Coiled flagella (%) 13.7±9.1* 23.9±11.5 25.3±16.6 Angulation (%) 2.5±2.3 3.9±3.6 3.2±3.0 Irregular caliber (%) 2.0±2.6 2.9±3.9 2.7±3.5 Normal flagella (%) 26.2±11.3* 8.4±9.6 7.5±9.2 Values are mean ± SD; n = total number of patients in each group. *Significant P < 0.05 (Student’s t-test) for the comparisons between the SPEF2 and DNAH1 groups.

ONLINE SUPPLEMENTARY METHODS

Variant annotation

The ANNOVAR software was used for functional annotation with the information of the OMIM,

Gene Ontology, KEGG Pathway, SIFT, PolyPhen-2, MutationTaster, 1000 Genomes Project

(1KGP), Exome Aggregation Consortium (ExAC), and gnomAD.2,3,4,5,6,7,8 MMAF has been assumed to follow an autosomal recessive inheritance according to previous pedigree analyses.9

10 Therefore, we mainly focused on homozygous or compound heterozygous rare variants identified by WES. Considering the fact that MMAF leads to male infertility, we filtered out

DNA polymorphisms with allele frequencies of > 1% in the human populations archived in the databases of 1KGP, ExAC, and gnomAD. Nonsense, frameshift, and essential splice-site variants were preferred. Missense variants predicted to be potentially deleterious simultaneously by the bioinformatic tools of SIFT, PolyPhen-2 and MutationTaster were also kept for further evaluation.5,6,7

References

1 Sironen A, Hansen J, Thomsen B, Andersson M, Vilkki J, Toppari J, Kotaja N. Expression of

SPEF2 during mouse and identification of IFT20 as an interacting

protein. Biol Reprod 2010;82:580-90.

2 Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from

high-throughput sequencing data. Nucleic Acids Res 2010;38:e164.

3 Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K,

Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese

JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. ontology: tool for the

unification of biology. The Consortium. Nat Genet 2000;25:25-9.

4 Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on

genomes, pathways, diseases and drugs. Nucleic Acids Res 2017;45:D353-D61.

5 Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on

protein function using the SIFT algorithm. Nat Protoc 2009;4:1073-81.

6 Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS,

Sunyaev SR. A method and server for predicting damaging missense mutations. Nat

Methods 2010;7:248-9.

7 Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: prediction for the

deep-sequencing age. Nat Methods 2014;11:361-2.

8 Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH,

Ware JS, Hill AJ, Cummings BB, Tukiainen T, Birnbaum DP, Kosmicki JA, Duncan LE, Estrada K, Zhao F, Zou J, Pierce-Hoffman E, Berghout J, Cooper DN, Deflaux N,

DePristo M, Do R, Flannick J, Fromer M, Gauthier L, Goldstein J, Gupta N, Howrigan D,

Kiezun A, Kurki MI, Moonshine AL, Natarajan P, Orozco L, Peloso GM, Poplin R, Rivas

MA, Ruano-Rubio V, Rose SA, Ruderfer DM, Shakir K, Stenson PD, Stevens C, Thomas

BP, Tiao G, Tusie-Luna MT, Weisburd B, Won HH, Yu D, Altshuler DM, Ardissino D,

Boehnke M, Danesh J, Donnelly S, Elosua R, Florez JC, Gabriel SB, Getz G, Glatt SJ,

Hultman CM, Kathiresan S, Laakso M, McCarroll S, McCarthy MI, McGovern D,

McPherson R, Neale BM, Palotie A, Purcell SM, Saleheen D, Scharf JM, Sklar P,

Sullivan PF, Tuomilehto J, Tsuang MT, Watkins HC, Wilson JG, Daly MJ, MacArthur

DG, Exome Aggregation C. Analysis of protein-coding genetic variation in 60,706

humans. Nature 2016;536:285-91.

9 Yang SM, Li HB, Wang JX, Shi YC, Cheng HB, Wang W, Li H, Hou JQ, Wen DG.

Morphological characteristics and initial genetic study of multiple morphological

anomalies of the flagella in China. Asian J Androl 2015;17:513-5.

10 Coutton C, Escoffier J, Martinez G, Arnoult C, Ray PF. Teratozoospermia: spotlight on the

main genetic actors in the human. Hum Reprod Update 2015;21:455-85.