The “A-β-” subtype of Ketosis-Prone Diabetes (KPD) is not predominantly a monogenic diabetic syndrome

Supplementary Data

Wade C. Haaland, PhD1,2, Diane I. Scaduto1,2, Mario R. Maldonado, MD4,5, Dena L Mansouri1,3, Ramaswami Nalini, MD4,6, Dinakar Iyer, PhD4, Sanjeet Patel, MD4, Anu Guthikonda, MD4, Christiane S. Hampe, PhD7, Ashok Balasubramanyam, MD4,6, Michael L. Metzker, PhD1,2,3

1 Human Genome Sequencing Center, 2 Interdepartmental Program in Cell and Molecular Biology, 3 Department of Molecular and Human Genetics, 4 Translational Metabolism Unit, Division of Diabetes, Endocrinology, and Metabolism, Baylor College of Medicine, Houston, TX; 5 Novartis, Inc., Basel, Switzerland; 6 Endocrine Service, Ben Taub General Hospital, Houston, TX; 7 Department of Medicine, University of Washington, Seattle, WA.

1 Research Design and Methods DNA isolation protocol for A-β- KPD patients Genomic DNA was extracted from peripheral blood leukocytes (PBL’s) by collecting whole blood in acid citrate dextrose vacutainers. The buffy coat was separated after centrifugation and treated with RBC-lysis buffer (10 mM Tris-HCl, pH 7.5; 320 mM sucrose; 5 mM MgCl2; 1% Triton X-100). PBL’s were isolated by centrifugation and treated with PBL-lysis buffer (400 mM Tris-HCl, pH 7.5; 60 mM Na2EDTA; 150 mM NaCl; 1% SDS), followed by deproteination using sodium perchlorate. Genomic DNA was extracted using chloroform, and the aqueous layer was precipitated with absolute ethanol. DNA samples were stored at -80oC.

Multiple displacement amplification (MDA) of genomic A-β- KPD DNA Whole genome amplification was performed for all KPD samples using the REPLI-g protocol (QIAGEN, Germantown, MD). Approximately 10-25 ng of genomic

DNA was denatured in 5 L denaturation solution (50 mM KOH; 0.125 mM Na2EDTA) and incubated at room temperature for three minutes. Following the addition of neutralization buffer, Ф29 DNA polymerase reactions were carried out according to the manufacturer’s protocol and incubated at 30C for 16 hours. Reactions were subsequently heat deactivated at 65C for three minutes and precipitated using two volumes of absolute ethanol (100 L). Precipitated DNA was spooled and resuspended in 400 L 10 mM Tris-HCl, pH 8.0; 1 mM Na2EDTA. Variants listed in Table 4 were verified by repeat experimentation from PCR and sequencing directly from their genomic DNA sources. All, but KPD0014, showed complete correspondence in variant calls between MDA and genomic DNA derived data. For KPD0014, MDA derived data revealed the homozygous “T” base for the R133W variant, while the genomic DNA derived data revealed a heterozygous “C/T” base, the latter of which is reported in the table.

Primers PCR and sequencing primers were either synthesized using an Applied Biosystems (AB) model 394 DNA synthesizer (Foster City, CA) or purchased from Integrated DNA Technologies (Coralville, IA). A number of PCR primers contained 5’-end universal sequences, useful for high throughput sequencing in serving as docking sites for complimentary universal sequencing primers (Supplementary Table 1).

Polymerase Chain Reaction (PCR) PCR was performed using either HotMaster Taq DNA polymerase (Eppendorf North America, Westbury, NY) or FastStart Taq DNA polymerase (Roche, Indianapolis, IN) according to the manufacturer’s protocol. Twenty-five ng MDA DNA or 100 ng genomic DNA was used as template for PCR reactions. Cycling conditions for the HotMaster reactions were as follows: two minutes at 94C followed by 35 cycles of 94C for 20 seconds, 57C for 20 seconds, 65C for 90 seconds, then 65C for five minutes. Cycling conditions for the FastStart reactions were as follows: six minutes at 95C followed by 35 cycles of 95C for 30 seconds, 57C for 30 seconds, 72C for 90 seconds, then 72C for seven minutes. PCR products were purified using the size-

2 exclusion, Montage PCR96 plates (Millipore, Bedford, MA) according to the manufacturer’s protocol.

Direct DNA Sequencing of PCR products Purified PCR products were sequenced with 1/16 diluted BigDye Terminator v3.1 (AB) sequencing chemistry. Ten L reactions contained 0.5 μL BigDye Terminator v3.1 reagent diluted with reaction buffer, a final sequencing primer concentration of 0.4 M, and two μL purified PCR product. For PCR primers containing 5’-end universal sequences, forward strands were sequenced using the 780 primer (5’-GTAAAACGAC- GGCCAGT) and reverse strands were sequenced using the WRev primer (5’-TATGAC- CATGATTACGCC). Cycling conditions were as follows: 96C for one minute followed by 25 cycles of 96C for ten seconds, 50C for five seconds, and 60C for four minutes. Sequencing reactions were purified by ethanol precipitation and analyzed on model 3100 or model 3730xl sequencers (AB) using Sequencing Analysis Software v5.1 and KB basecaller (AB). Heterozygous bases were automatically called if the minor peak was ≥25% the height of the major peak. Sequence data for forward and reverse strands were then aligned and manually inspected to form double-stranded consensus contigs for each individual using the Sequencher program (Gene Codes Corp., Ann Arbor, MI). Heterozygous base calls were confirmed when present in both strands of the consensus contig. Allele frequencies were calculated from multiple sequence alignments, and Hardy-Weinberg equilibria for sequence variants were evaluated using the χ2 test.

3 TABLE A1. PCR primers. ‡For multiple primer pairs, the analyzed region length was determined from the minimal overlap of amplicons

Region Gene Exon Primer name Forward primer Reverse primer (bp)‡ HNF1A HNF1a_P.1 5'-TCCCATCGCAGGCCATAGCTC 5'-CCGTCTGCAGCTGGCTCAGTT (4,864 bp) Prom/Ex1 HNF1a_1.3 *5'-TAAACAGAACAGGCAGGG †5'-AGGAGCAGTTCAGGGGCTG 730 HNF1a_1.4 *5'-TGCAAGGAGTTTGGTTTGTG †5'-TCAAACCTCCAAGCAAGGAC Ex2 HNF1a_2.1 *5'-GGCATAAATGACCATACCTC †5'-GGACCCATTTCATTCATC 564 Ex3 HNF1a_3.1 *5'-TAGTGATGTTTGCCTTG †5'-TAAGCCAATATCAGGAG 489 Ex4 HNF1a_4.1 *5'-CACTTTATGAATGGAGAGAC †5'-AGAGGTTTAGGTGACTGCTG 486 HNF1a_5.1 *5'-GTCAGGTATAGCACTAGGC †5'-GCTGCTGAGACCTACGAG Ex5&6 901 HNF1a_6.3 *5'-GCCTGCTGAGTACAGAAG †5'-CCTGCTTGAGTTGCTGAG Ex7 HNF1a_7.2 *5'-ATGACTTGCCAGAGCCAC †5'-ACTAGGGATCTGCTCACAC 550 Ex8&9 HNF1a_8/9.1 *5'-ATCTCCAACTGCTGCCCAG †5'-ACGCCTGCCAGTGCTTCC 511 HNF1a_8/9.2 *5'-GTCTCCAGGTGACAAGAG †5'-GTGGGACCAACATGGAAG Ex10 HNF1a_10.1 *5'-GTGTTTGACTCAGCCTAGC †5'-ATGAACAGGCTTTGCTCC 633

HNF1B Ex1 HNF1b_1.3 *5'-GCCCTTCCCACTAATTTG †5'-GGCGCAGTGTCACTCAGG 617 (3,658 bp) Ex2 HNF1b_2.1 *5'-GGCAGTCACCTTCTCCTCTG †5'-CCAAGGCCAAATCTACTTGC 485 Ex3 HNF1b_3.1 *5'-CTTCGTCCGTTGTCTGTCTG †5'-TCTGTGTACTTGCCCACCTG 356 Ex4 HNF1b_4.2 *5'-ATGGATTGGCCTTTTCTCTG †5'-TAAGATCCGTGGCAAGAACC 428 Ex5 HNF1b_5.3 5'-CTGGTGGCACTAATGTTC 5'-GCCTTGTGAGAAGTTGTG 244 Ex6 HNF1b_6.1 *5'-CACATCGTGTTGGAAACTGC †5'-GACATTGAATCTCCTGAAGG 443 Ex7 HNF1b_7.1 *5'-CGGTGACTGGGACATTGAG †5'-TAGGCAGGGAAAAGTGACCA 373 Ex8 HNF1b_8.3 5'-TACCTGTGTCTTTGCCTG 5'-GGGAGCCTCAGAAGGATC 259 Ex9 HNF1b_9.1 *5'-CAAACTAATGGCCCATGACC †5'-CAGTGTGTTTGGCTCAGTTC 453

HNF4A Ex1P2 HNF4a_P2 5'-TTCTGCTCCGGCCCTGTC 5'-AAGCTGACCGCAGTCCCG 416 (5,213 bp) Ex1 HNF4a_1.1 5'-GTCAAATGAGTGCCCGTG 5'-TTCACTTGGCAACACCTG 606 Ex2 HNF4a_2.1 *5'-GCCTTCCTAGAGAAAGC †5'-AGACTTAGTATTGTGCCTG 662 Ex3 HNF4a_3.1 *5'-CCAGAGGTCAAGGTTCC †5'-TGAGGAGCCAAGAGTG 573 Ex4 HNF4a_4.1 *5'-CTGATGTGGGCCTGTTCT †5'-GCCCTCAGTGAAGGTGAAG 457 HNF4a_5.1 *5'-CTCCCTCCCTCCGTTTTTAC †5'-ACGGCTATATCCCAGG Ex5 176 HNF4a_5.2 5'-GCATTTTCTTCCCTGTATC 5'-TACTGCCCACCATCCACG Ex6 HNF4a_6.1 *5'-GCACATGTTCTTTCCCCTTC †5'-AATGCTGGGAATTTGGTGAC 458 Ex7 HNF4a_7.1 5'-TATCTTCTGAATCTGGGC 5'-CATCTTGAACCCCTGACC 549 Ex8 HNF4a_8.1 *5'-ACAAGTCAGGGGACATCTGG †5'-TTGTTCCCATTTTTCTGG 453 Ex9 HNF4a_9.1 *5'-AATATTGGATGGGCTGGT †5'-TCATCCCAACAATGGCTTC 436 Ex10 HNF4a_10.1 *5'-GGGACTCACAGAAGGTTGAG †5'-TTCATCCTTCCCATTCCTG 427

PDX1 Ex1 PDX1_1.4 5’-GCCACACAGTGCCAAATC 5'-CAGAGAGAAGGCTCCTGG 541 (1,285 bp) Ex2 PDX1_2.4 5'-GCTTGAGTTACTAGGGAAG 5'-GTTTTCCCCTTCGGTCTAAG 744

ND1 Ex1 ND1_1.1 *5'-ATACAAATGGGCAGGTCACG †5'-TCTCGCAAACGCACACATAC 449 (1,703 bp) ND1_2A.2 5'-GCCTCTCCCTTGTTGAATG 5'-GACAGAGCCCAGATGTAG Ex2 ND1_2B.2 5'-GCCTTGCTATTCTAAGAC 5'-GTCTATGGGGATCTCGCA 1,254 ND1_2C.1 5'-CAAAGCCACGGATCAATC 5'-GGTGAACAGGAACTTTGA

4 TABLE A1. Continued

GCK GCK_1A.1 †5'-TTCTCAAAGAGCCTGTGC Ex1 *5'-CTCAGGAGCACAGTAAG 575 (4,367 bp) GCK_1A.2 5'-GCAAGCAAACACTCCCAG GCK_2.4 5'-ACTTCATTGCTCCCCAGG Ex2 5'-CCTGTAGTAAAGTTTGAG 406 GCK_2.5 †5'-GCTGTGAGTCTGGGAGTG Ex3 GCK_3.1 *5'-GACCCCTTCCACAGTTG †5'-TCTGGTAAACTGGACAAGG 586 GCK_4.1 *5'-TGGAGCCTCAGGAATAG †5'-TTAGAGGTGGCAGGTGAC Ex4 326 GCK_4.2 5'-AGTGTCCCTGAGGAATAG 5'-CACTGGAGTGGGGTGATC Ex5&6 GCK_5.1 *5'-CCTCCAGTATATGTTAGC †5'-GGTGCTTCCATCTTGATAC 482 Ex7 GCK_7.1 *5'-GAAGGTGTTACTGTTGCC †5'-GCTTACGAACGGATTGTC 513 Ex8 GCK_8.3 *5'-GGTCTTTGAACTATCTGTC †5'-CTGAGACCAAGTCTGC 425 GCK_9.1 *5'-CTGGGCTTAGTTAGAGGG Ex9 †5'-GACTACGAAATCTTGGAGC 510 GCK_9.2 *5'-GTACTAACCAGTCCCTGGC Ex10 GCK_10.1 *5'-GCTCCAAGATTTCGTAGTC †5'-AAGTCCTGAGTGAGCAAC 544

PAX4 Ex1-3 PAX4_1-3.1 5'-AGGTGGTGTGTGGATACCTC 5'-GATTTGGCTGTGATTAGCCC 1,086 (3,097 bp) Ex4-6 PAX4_4-6.1 5'-CAGCTTGGCTCTGCTCTTCT 5'-TGGTGCAGAGAAATCACCTG 960 Ex7-9 PAX4_7-9.1 5'-AGTGGCTGACTTTCCTAGAAC 5'-TGGGCAGGATGGTATTAGATCT 1,051 TCTCTATG

PAX4 Ex1-3 PAX4-1bF 5'-CGCACAGGTGTCTTGGAG Sequencing PAX4-1R 5'-CCTTCAGAGGAGCCCTTTCT PAX4-2R 5'-GGCAGCCTCCTCTCTCTCTC PAX4-3bR 5'-GGCTGGACACTCACCCTT Ex4-6 PAX4-4bF 5'-CAAGGAGGAAGAGTCTGG PAX4-4R(2) 5'-CTGAGGACTCTCTGACCCTCC PAX4-5F 5'-GGAGACCCATGCCTTGCTCC PAX4-5R 5'-CCCTCCCTGCTCTAGCTTTT PAX4-6F 5'-TGAGATCAGCAGGTGACAGG Ex7-9 PAX4-6bR 5'-GAGCCCTTCAGTCTTCCC PAX4-7bF 5'-GGCAACAGCACCAGAAAG PAX4-8FpolyT 5'-TCTTGCTTTTTTTTTTTA PAX4-9bR 5'-GACATCAGTTTCCCACCC

5'- Universal sequences * = 5'-CGTTGTAAAACGACGGCCAGT † = 5'-GCTATGACCATGATTACGCC

‡ For multiple primer pairs, the analyzed region length was determined from the minimal overlap of amplicons

5 TABLE A2a. Sequence variants identified in A-β- KPD patients with MAF <0.05

Exon Patient ID Allele dbSNP Frequency 267bp 5' KPD0020, KPD0027 4.2% G→T 217bp 5' GCKEx1a KPD0020, KPD0027 4.2% C→G A11T KPD0123 1.4% G→A 75bp 5' KPD0110 1.4% C→T 14bp 3' GCKEx4 KPD0018 1.4% A→G 122bp 3' KPD0110 1.4% G→C 18bp 3' GCKEx8 KPD0163 1.4% G→A L92L HNF1AEx1 KPD0014 rs34056805 1.4% C→T A174V KPD0203 1.4% C→T HNF1AEx2 53bp 3' KPD0119 1.4% C→G 115bp 3' HNF1AEx6 KPD0102 1.4% A→G KPD0001, KPD0115, 68bp 3' HNF1AEx7 4.1% KPD0116 A→G 7bp 5' KPD0206 1.4% C→T HNF1AEx8/9 KPD0102, KPD0110, G574S rs1169305 4.1% KPD0119 G→A 50bp 5' HNF4AP2 KPD0115 1.4% G→C 181bp 5' KPD0114 rs6093976 1.4% C→T KPD0102, KPD0123, 241bp 5' HNF4AEx3 4.1% KPD0163 A→T KPD0102, KPD0123, 19bp 3' 4.1% KPD0163 C→T 47bp 5' 1.4% T→C 34bp 5' 1.4% T→C N153N 1.4% HNF4AEx4 KPD0119 T→C 105bp 3' 1.4% G→C 166bp 5' 1.4% C→A 175bp 5' 1.4% G→A 42bp 5' HNF1AEx6 KPD0098 rs3751156 1.4% G→T 44bp 5' HNF4AEx8 KPD0123, KPD0163 rs3212207 2.8% G→C 139bp 3' HNF4AEx9 KPD0119 1.4% T→C 83bp 5' HNF4AEx10 KPD0069, KPD0118 2.7% G→A N228K HNF1BEx3 KPD0006 1.4% C→G 47bp 3' HNF1BEx9 KPD0001 rs8068014 2.7% T→G 6 TABLE A2a. Continued

KPD0069, KPD0123, CCAAT 4.2% KPD0216 C→T P33T PDX1Ex1 KPD0193 1.4% C→A F100F KPD0193 1.4% C→T 68bp 5' KPD0154 1.4% G→T PDXEx2 P239Q KPD0053 1.4% C→A ND1Ex1 207bp 5' KPD0123 rs8192554 1.4% (UTR) G→T 150bp 3' PAX4Ex2 KPD0020 rs2233577 1.4% G→A 122bp 5' KPD0025, KPD0216 2.7% C→T PAX4Ex3 KPD0014, KPD0193, R133W rs2233578 4.1% KPD0208 C→T 46bp 5' PAX4Ex7 KPD0042 rs2233581 1.4% C→T 73bp 5' PAX4Ex9 KPD0089, KPD0115 2.7% C→G

7 TABLE A2b. Sequence variants identified in A-β- KPD patients with MAF ≥0.05 Exon Allele dbSNP Frequency Exon Allele dbSNP Frequency 84bp 5' 33bp 3' GCKEx1a rs13306391 6.9% HNF4αEx1 5.4% C→G G→A 58bp 3' 226bp 5' 8.1% rs3212179 17.6% A→T C→T GCKEx4 87bp 3' 185bp 5' rs2268573 27.0% rs3212180 21.6% A→T C→G 38bp 3' 38bp 5' rs2268574 20.3% HNF4αEx2 rs736824 44.6% G→C T→C GCKEx6 85bp 3' 35bp 5' rs2268575 18.9% rs745975 14.9% A→G C→T 8bp 3' A67A rs2908274 43.2% rs736823 5.4% G→A C→T GCKEx9 49bp 3' T139I rs13306387 12.2% rs1801961 5.4% C→T C→T HNF4αEx4 L17L 140bp 3' rs1169289 44.6% rs11574738 18.9% C→G C→G TCF1Ex1 I27L 141bp 3' rs1169288 28.4% rs6103731 44.6% A→C A→G HNF4αEx6 158bp 5' 196bp 3' rs1169292 29.7% rs11086925 8.1% C→T G→A 91bp 5' 169bp 3' rs1169293 14.9% HNF4αEx7 rs3212201 43.2% A→G A→G TCF1Ex2 42bp 5' 151bp 5' rs1169294 27.0% 9.5% G→A A→C 66bp 3' 145bp 5' rs12427353 9.5% HNF4αEx10 rs3746574 33.8% G→C T→C 98bp 5' 67bp 5' rs1169300 27.0% rs3746575 47.3% G→A G→A 51bp 5' 49bp 5' TCF1Ex3 rs2071190 25.7% TCF2Ex3 rs34052308 17.6% T→A ΔTCTG 23bp 5' 27bp 3' rs1169301 27.0% TCF2Ex6 rs2107133 5.4% C→T T→C G288G 48bp 3' TCF1Ex4 29.7% TCF2Ex8 rs35913775 10.8% G→C Ins C 221bp 5' 22bp 5' TCF1Ex5 rs10849828 28.4% rs3110641 27.0% G→A C→T 101bp 3' 99bp 3' TCF1Ex6 rs10688575 31.1% rs2229295 16.2% Δ(TCAT)2 A→C 96bp 5' 100bp 3' rs3213547 6.8% TCF2Ex9 rs1800929 24.3% G→T/A A→G L459L 274bp 3' rs2259820 28.4% rs2689 29.7% C→T A→T S487N 384bp 3' TCF1Ex7 rs2464196 28.4% rs1058166 12.2% G→A A→G 7bp 3' 128bp 5' rs2464195 32.4% 20.8% G→A C→G IPF1Ex2 119bp 3' 70bp 5' rs2259816 32.4% rs9805632 15.3% G→T G→C T515T ND1Ex1 37bp UTR 10.8% rs8192555 17.6% G→A (UTR) A→G TCF1Ex8/9 29bp 3' rs1169304 27.0% ND1Ex2 A45T G→A rs1801262 21.6% C→T 158bp 5' rs3999412 29.7% Δ(TT) TCF1Ex10 24bp 5' rs735396 33.8% T→C

8 TABLE A2b. Continued

Exon Allele dbSNP Frequency 170bp 3' PAX4Ex1 rs698406 25.7% C→G Q173Q rs327517 5.4% A→G PAX4Ex2 96bp 3' rs3835004 25.7% C→: 63bp 5' PAX4Ex8 rs327518 34.7% C→T P321H PAX4Ex9 rs712701 23.0% C→A

TABLE A3. Distribution of sequence variants according to type and frequency

Variants Frequency Total SNPs 5% MAF 54 INDELs* 5% MAF 5 SNPs < 5% MAF 40 INDELs < 5% MAF 0 Total 99 * Insertions or deletions

TABLE A4. Distribution of sequence variants according to gene location

Variants Variant Location Total per Kilobase

Proximal Promoter 1 0.5 Intronic 66 5.8 Exonic: 12 1.3 cSNPs – missense cSNPs – silent 9 1.0 UTRs 11 7.4 Total 99 4.1

9